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Foreword 



Is computing an experimental science? For the roots of program optimization 
the answer to this question raised by obin Milner ten years ago is clearly 
yes: it all started with onald Knuth=s extensive empirical study of Fortran 
programs. This benchmark-driven approach is still popular, but it has in 
the meantime been complemented by an increasing body of foundational 
work, based on varying idealizing assumptions ranging from ©pace is for free= 
over ©here are suEciently many registers^o Ipirograms consist of 3-address 
code— valuation of the adequacy of these assumptions lacks the appeal of 
run-time measurements for benchmarks, which is so simple that one easily 
forgets about the diEculties in judging how representative the chosen set of 
benchmarks is. Ultimately, optimizations should pass the (orthogonal) tests 
of both communities. 

This monograph is based on foundational, assumption-based reasoning, 
but it evolved under the strong pressure of the experimental community, who 
expressed doubts concerning the practicality of the underlying assumptions. 
Oliver ufhing responded by solving a foundational problem that seemed 
beyond the range of eEcient solutions, and proposed a polynomial algorithm 
general enough to overcome the expressed concerns. 

Register Pressure:. A rst formally complete solution to the problem of reg- 
ister pressure in code motion P hoisting computations enlarges the corre- 
sponding life-time ranges P was proposed for 3-address code. This assump- 
tion allowed a separate treatment of single operator expressions in terms of 
a bit vector analysis. 

The algorithm, although it improves on all previous approaches, was crit- 
icized for not taking advantage of the C^xibility provided by complex ex- 
pression structures, which essentially boils down to the following trade-oR 
patterns: 

— if (two) operand expressions are only used once, within one large expres- 
sion, one should hoist its evaluation and release the registers holding the 
operand values; 

— if there are multiple uses of the operand expressions, then one should keep 
the operand values and delay the evaluation of the large expressions. 

Based on matching theory, ulfhing proposes an algorithm that optimally 
resolves this B)rade-oR=problem in polynomial time. 




VI 



Foreword 



Interacting Transformations:. Optimizing transformations may support and/ 
or impede each other, as illustrated by the two trade-oR patterns in the pre- 
vious paragraph: hoisting a large expression is supportive in the rst but im- 
peding in the second. In this sense, the corresponding optimal algorithm can 
be regarded as a complete solution to a quite complex interaction problem. 
In this spirit, Oliver ulfhing additionally investigates the complexity and 
the interaction potential of assignment motion algorithms comprising both 
hoisting and sinking, and establishes a surprisingly low complexity bound for 
the Ihreta-iteration=cycle, resolving all the so-called second-order effects. 

Finally, the monograph sketches how these two results can be combined in 
order to achieve independence of the assignment granularity. In particular, 
the combined algorithm is invariant under assignment decomposition into 3- 
address code, as required for many other optimization techniques. This is of 
high practical importance, as this increased stability under structural changes 
widens the range of application while maintaining the optimizing power. I 
am optimistic that conceptual results like this, which seriously address the 
concerns of the experimental community, will help to establish fruitful cross- 
community links. 

Summarizing, this monograph, besides providing a comprehensive account 
of the practically most accepted program analysis and transformation meth- 
ods for imperative languages, stepwise develops a scenario that overcomes 
structural restrictions that had previously been attacked for a long time with 
little success. In order to do justice to the conceptual complexity behind this 
breakthrough, ulfhing provides all the required formal proofs. They are not 
always easy to follow in full detail, but the reader is not forced to the tech- 
nical level. ather, details can be consulted on demand, providing students 
with a deep, yet intuitive and accessible introduction to the central principles 
of code motion, compiler experts with precise information about the obsta- 
cles when moving from the 3-address code to the general situation, and the 
algorithms=community with a striking application of matching theory. 



Bernhard SteRen 




Preface 



Code motion techniques are integrated in many optimizing production and 
research compilers and are still a major topic of ongoing research in pro- 
gram optimization. However, traditional methods are restricted by the narrow 
viewpoint on their immediate e ects. A more aggressive approach calls for 
an investigation of the interdependencies between distinct component trans- 
formations. 

This monograph shows how interactions can be used successfully in the design 
of techniques for the movement of expressions and assignments that result 
in tremendous transformational gains. For expression motion we present the 
?rst algorithm for computational and lifetime optimal placement of expres- 
sions that copes adequately with composite expressions and their subexpres- 
sions. This algorithm is further adapted to situations where large expressions 
are split into sequences of assignments. The core of the algorithm is based 
upon the computation of maximum matchings in bipartite graphs which are 
used to model trade-o situations between distinct lifetime ranges. 

Program transformations based upon assignment motion are character- 
ized by their mutual dependencies. The application of one transformation 
exposes additional opportunities for others. We present simple criteria that 
guarantee confluence and fast convergence of the exhaustive transformational 
process. These criteria apply to a number of practically relevant techniques, 
like the elimination of partially dead or faint assignments and the uniform 
elimination of partially redundant expressions and assignments. 

This monograph is a revised version of my doctoral dissertation which was 
submitted to the Faculty of Engineering of the Christian- Albrechts University 
at Kiel and accepted in July 1997. 
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1. Introduction 



Traditionally, program optimisation stands for various kinds of program 
transformations intended to improve the run-time performance of the gen- 
erated code. Optimising transformations are usually applied as intermediate 
steps within the compilation process. More specifically, an optimising com- 
piler principally proceeds in three stages [Muc97]: 

— The front end translates the source program into an intermediate form. 
Usually, intermediate level programs are represented in terms of flow graphs 
built of quite elementary, yet machine-independent instructions. 

— The optimiser performs several transformations on the intermediate level, 
each addressing a particular optimisation goal of interest. 

— The back end finally translates the transformed intermediate program into 
machine-specific code. This stage particularly comprises typical machine- 
dependent tasks like register allocation [Cha82, CH90, Bri92b] and instruc- 
tion scheduling [BR91, GM86]. 

Optimising program transformations should preserve the semantics^ of the 
argument program, while improving its run-time efficiency. Ideally, the im- 
provement is witnessed by a formal optimality criterion. Since the interme- 
diate level languages under consideration are computationally universal, al- 
most all program properties of interest are in general undecidable. For many 
program properties this is even true, if conditions are treated in a non- 
deterministic way [RL77], i. e. the conditions are not evaluated and each pro- 
gram path is considered executable. To force decidability and furthermore to 
gain efficiency further assumptions are introduced that abstract from certain 
aspects of the program behaviour. Common assumptions are for instance: 

— Assignments change the values of all expressions having the left-hand side 
variable as an operand. 

— There is an infinite number of (symbolic) registers. 

— Operations are significantly more expensive than register transfers. 



^ This requirement addresses both the preservation of total correctness and the 
preservation of partial correctness. A more detailed discussion on this topic with 
respect to the transformation presented in the monograph can be found in Chap- 
ter 7. 



O. Ruething: Interacting Code Motion Transformations, LNCS 1539, pp. 1-8, 1998. 
© Springer- Verlag Berlin Heidelberg 1998 
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The usage of such assumptions has a number of advantages. First it allows 
us to gather information by means of efficient standard techniques for static 
program analysis [Hec77, MJ81, CC77, Nie86]. Moreover, the assumptions 
can often be organised in a way that expresses a hierarchical structure of 
concerns. Certain aspects are neglected in a first approximation, but investi- 
gated separately later on. Finally, the use of formal optimality criteria makes 
alternative approaches comparable on a clean abstract basis. 



1.1 Interacting Program Optimisations 

It is well-known that in program optimisation one is often faced with massive 
interactions between distinct optimisation techniques. One optimising trans- 
formation can create as well as destroy opportunities for other optimising 
transformations. Usually the dependencies are resolved at most heuristically: 
an ad-hoc phase ordering [Muc97, ASU85] among the pool of optimisations 
is chosen based upon the most striking dependencies, often resulting from 
empirical observations. In some cases the same optimisation technique may 
be applied several times in order to cover the potential introduced by other 
transformations. In fact, there are almost no approaches that systematically 
exploit the dependencies between distinct optimisations. Whitfield and Sofia 
[WS90, WS97] use pre- and post-conditions of optimisations as interface spec- 
ifications in order to capture those optimisations that are enabled or disabled 
by other ones. Unfortunately, their proposal reveals application orders that 
do not take into account cyclic dependencies between distinct optimisations.^ 
Click and Cooper [CC95, Cli95] present a framework for combining optimi- 
sations by combining their underlying data flow analyses. They demonstrate 
their framework for an extension of Wegman’s and Zadeck’s algorithm for 
computing conditional constants [WZ85, WZ91], where constant propaga- 
tion is combined with the elimination of unreachable branches of conditional 
statements. Both transformations mutually benefit from each other: the de- 
tection of constants may exhibit more accurate information on branching 
conditions, while the detection of unreachable branches may lead to the de- 
tection of further constants. Essentially, their proposal is based on the com- 
bination of two monotone data flow analyses [KU77] by means of monotone 
transfer functions. Under these conditions a combination can reach results 
that cannot be obtained by exhaustive application of the individual trans- 
formations.^ However, Click’s and Cooper’s approach is only applicable to 
optimising transformations where the underlying data flow informations are 



Except for the repeated application of a single technique in isolation. 

® Such phenomena have been observed before. Giegerich et. al [GMW81] showed 
that repeated dead code elimination can be collapsed to a single data flow anal- 
ysis {faint code analysis) that outperforms the results of iterated dead code 
elimination in some cases. 
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pinned to fixed program points, and thus cannot be used for code motion 
transformations. 

This monograph focuses on interactions in code motion. In particular, 
we examine the question how the optimisation potential can be exhausted 
completely, and if so, if the results are independent from the application 
order, and what the costs are in terms of computational complexity. 



1.2 Interacting Code Motion Transformations 

Code motion is an important technique in program optimisation whose im- 
pact rests on the idea to move code to program places where its execution 
frequency is reduced or, more importantly, it can be eliminated since the 
movement exposes opportunities for suitable elimination techniques to be- 
come applicable. Nowadays, code motion algorithms have found their way 
into many production compilers and are still a main concern of actual re- 
search in program optimisation. 

Nonetheless, traditional methods are often limited by the narrow view on 
their immediate effects. On the other hand, a more aggressive approach to 
code motion requires to investigate the interdependencies between distinct 
elementary transformations, too. In essence, interdependencies show up in 
two different flavours: 

Order dependencies: this addresses situations where an improvement that 
can be reached by the sequential execution of two component transfor- 
mations is out of the scope of their individual impacts. Typically, this 
happens if one transformation exposes opportunities for the other one. 
Commonly, such phenomena are known as second order effects and their 
resolution provides an enormous potential for transformational gains. 
Structure dependencies: this addresses situations where the result that can 
be reached by investigating a more complex structure^ cannot naively be 
combined from transformations operating on smaller entities. This kind of 
interdependencies is even more serious, as often the “naive” combination 
is not only suboptimal but even inadequate. 

We will investigate prominent and practically relevant examples for both 
kinds of interactions and present how their interdependencies can be re- 
solved completely exploiting the optimisation potential as far as possible. 
To this end we give a taxonomy of (syntactic) code motion and present pow- 
erful extensions to standard methods where dependencies among elementary 
transformations play a crucial role. 



^ That means the object of the transformation under investigation, e.g., a single 
expression (assignment) or a set of expressions (assignments). 
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1.2.1 Interactions in Expression Motion 

Expression motion, which addresses techniques that are concerned with the 
movement of right-hand side parts of instructions in a program only, has 
thoroughly been studied in program optimisation [Cho83, Dha83, Dha88, 
Dha89b, Dha91, DS88, JD82a, JD82b, Mor84, MR79, MR81, Sor89]). Its 
primary goal is to eliminate partially redundant expressions, i. e. expressions 
that are unnecessarily reevaluated on some program paths at run-time. This 
is achieved by replacing the original computations of a program by tempo- 
rary variables (symbolic registers) that are initialised correctly at suitable 
program points. It is known for decades that partial redundancies can be 
eliminated as far as possible.® The resulting programs are computationally 
optimal, i. e. on every complete program path the number of evaluations of 
a program term is reduced to a minimal amount. On the other hand, the 
primary goal leaves plenty of room for distinct computationally optimal pro- 
grams. At this point a secondary aspect of expression motion comes into play. 
By definition, the gain of expression motion comes at the price that values 
have to be stored in a private symbolic register. In fact, using the idea to 
separate the concerns of expression motion, the primary goal, i. e. reaching 
computational optimality, is investigated under the assumption that there 
is an unbounded number of such symbolic registers. More realistically, how- 
ever, registers - like other resources - are limited. Symbolic registers finally 
have to be mapped onto the set of available machine registers. This task is 
commonly known as register allocation [Cha82, CH90, Bri92b]. Therefore, 
an uneconomic usage of symbolic registers would reveal its fateful impact at 
the register allocation phase leading to register pressure that finally causes 
the generation of spill code® slowing down the program. Therefore, a sec- 
ondary goal of expression motion is to use the resource of symbolic registers 
as economically as possible. In more technical terms this requirement means 
to keep the lifetime ranges of the symbolic registers as small as possible. In 
[KRS92] we developed an algorithm for lazy expression motion which was 
the first one being computationally as well as lifetime optimal for a single 
expression under investigation. This single-expression view is typical for al- 
gorithms in expression motion. It is motivated by the fact that an extension 
to multiple expressions is straightforward, if the set of expressions is flat, 
i.e. the set does not contain both expressions and their subexpressions at 
once. In this case a simultaneous algorithm is essentially determined as the 
independent combination of all individual transformations. Such a situation 
is for instance given for intermediate representations of programs where ex- 
pressions are completely decomposed into three-address format. Even though 
such an assumption seems attractive at first glance, one should keep in mind 

® Indeed, this means under a formal optimality criterion that is commonly accepted 
for syntactic expression motion. 

® That is, code used in order to store registers into main memory and to reload 
them. 
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that splitting up large expressions comes at the price of weakening the po- 
tential for expression motion. This is because the decomposition introduces 
a bunch of assignment statements which may block the movement process. 

To this end extensions that are able to cope with expression sets with a 
non-flat structure are much more appealing. Whereas the primary focus on 
computational optimality is easy to preserve with a careful combination of 
elementary transformations, considerations with respect to lifetimes of sym- 
bolic registers are heavily affected. Here we are faced with subtle trade-offs 
between the lifetimes of different symbolic registers that cannot be resolved 
from the isolated point of view with respect to single-expression transfor- 
mations. Hence this problem is a model for structure dependencies in code 
motion. 

As a main result we present the first algorithm for lifetime optimal expression 
motion that adequately copes with composite expressions and their subex- 
pressions at once. The central idea is to model trade-off situations among 
expressions at each program point by means of bipartite graphs and then to 
determine optimal trade-offs by means of graph matching techniques [LP86]. 
Fortunately, the local view on trade-offs between lifetimes of temporaries is 
well-behaved with respect to its globalisation leading to a refinement based 
approach that can be sketched as a three-step procedure: 

1. Perform the traditional data-flow analyses gathering the information for 
the computationally and lifetime optimal movement of a single expres- 
sion. 

2. Compute for each program point a most profitable trade-off between 
lifetimes of symbolic registers. Informatively, such a trade-off is based on 
two sets of expressions: 

i) The set of expressions R that definitely occupy a symbolic register on 

entering the given program point, but whose registers can be released, 
since they are only used for initialisations of expressions of kind ii) . 

ii) The set of expressions I that may be initialised at the program point 
such that their values have to be stored in a symbolic register for a 
later usage. 

If R is larger than I, it is profitable to release the i?-associated life- 
times at the costs of introducing new /-associated ones. Moreover, a 
most profitable trade-off addresses one for which the difference between 
the cardinalities of the sets R and / gets maximal. 

3. Adjust the information computed in the first step by means of the local 
information gathered in the second step. 

1.2.2 Interactions in Assignment Motion 

Assignment motion complements expression motion by incorporating the 
movement of left-hand side variables of assignments as well. Although such 
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an extension may appear straightforward at first glance, the movement of 
left-hand sides of assignments induces second order effects, i.e. one is now 
faced with order dependencies whose resolution requires the iterated applica- 
tion of the component transformations. The consequences are twofold: on the 
one hand the iterated application reveals much more optimisation potential 
then a one-step procedure, on the other hand iteration has its price in terms 
of extra costs. 

Another essential difference to expression motion is the fact that not only 
code hoisting, i.e. backward movement of code, but also code sinking, i.e. 
forward movement of code becomes important. This observation led to the 
development of a technique for partial dead code elimination and partial faint 
code elimination [KRS94b] that complements expression motion. 

This monograph gives a systematic approach to the phenomena of interacting 
program transformations that are based upon assignment motion. To this 
end a general framework is developed in which criteria for the following two 
questions of interest are examined: 

Confluence: do different iteration sequences eventually collapse in a unique 
result? 

Complexity: what are the extra costs in terms of a penalty factor compared 
to a single-step application of the elementary components? 

We provide simple criteria that grant confluence and fast convergence for the 
exhaustive iteration of elementary transformations. 

Finally, we also investigate structure dependencies in assignment motion by 
adapting the lifetime considerations in expression motion to the situation in 
assignment motion. In fact, our technique for minimising lifetime ranges can 
be directly adapted to the assignment motion situation. This way our method 
becomes applicable even if large expressions are already split at the interme- 
diate level, which is the situation of highest practical relevance. Even better, 
the phase that is responsible for minimising the lifetime ranges of temporaries 
can be completely decoupled from the setting of partial redundancy elimina- 
tion. This results in a stand-alone technique for resolving register pressure 
before register allocation takes place. At each program point the number of 
symbolic registers that are required to hold values is reduced to a minimum 
which tremendously eases the starting situation of the register allocator. 



1.3 Code Motion in the Presence of Critical Edges 

Critical edges [MR79, Dha88, SKR90, Dha91, DRZ92] in flow graphs, i.e. 
edges leading directly from branch-nodes to join-nodes, are known to cause 
massive problems for code motion. Essentially, the reason for all these prob- 
lems is that critical edges prevent to decouple run-time situations arising 
on distinct program paths properly. From a theoretical point of view the 
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problem can be easily overcome by introducing an empty synthetic node on 
every critical edge, a technique that is commonly known as edge splitting. 
On the other hand, in practice moving code to split nodes has also some 
drawbacks, as, for instance, additional unconditional jumps are introduced. 
In the light of this dilemma implementors are sometimes unwilling to split 
critical edges. Therefore, the problems due to critical edges are well-studied. 
The most prominent two drawbacks are: 

— Critical edges may cause poor transformations due to the lack of suitable 
placement points. 

— Critical edges may impose higher solution costs of the associated data flow 
analyses due to the usage of bidirectional equations. 

Whereas the first drawback cannot be remedied, a lot of research [Dha88, 
DK93, DP93, KD94] has addressed the second deficiency. While a restricted 
class of weakly bidirectional problems can be solved as easily as undirectional 
ones, it was recently shown that bidirectional code motion algorithms are in- 
herently more complex than their undirectional counterparts when following 
a naive round-robin schedule [DP93, KD94]. Surprisingly, the latter result 
is diminished in the light of a new technique presented here: bidirectional 
data flow analyses can completely be avoided if the flow graph is (virtually) 
enriched by shortcuts that are used to bypass nests of critical edges which 
are the reason for slow information flow along “zig-zag” paths. 

While the “classical” deficiencies all address standard non-interacting code 
motion algorithms, here, in addition, we further investigate the impact of 
critical edges for interacting code motion transformations. However, our main 
findings are negative: 

— Critical edges ruin the existence of lifetime optimal expression motion for 
structured sets of expressions. 

— Critical edges significantly slow down the exhaustive iteration process for 
assignment motion based transformations. 

These new results strengthen the argument for splitting critical edges, espe- 
cially since at least the latter deficiency must be apparent in all assignment 
motion based implementations of partial redundancy elimination that do not 
split critical edges. 



1.4 Structure of the Monograph 

The monograph starts with a preliminary chapter being devoted to funda- 
mental notions and concepts. The remainder is split into two parts reflecting 
our two major themes on interacting code motion transformations. Part 
1 comprises Chapter 3 to 5 and deals with expression motion. Chapter 3 
starts by investigating the standard situation for the single-expression view 
of expression motion. This chapter essentially provides a survey on the main 
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results of [KRS94a] . Chapter 4 then deals with the extension to the multiple- 
expression situation. As the major achievement we present an algorithm for 
computationally and lifetime optimal expression motion that works for ar- 
bitrary structured sets of expressions. To this end, a significant part of the 
chapter prepares the graph theoretical background on which our local trade- 
off decisions are based upon. Central is the development of a simple graph 
theoretical algorithm for computing tight sets in a bipartite graph. For the 
sake of presentation the algorithm reaching full lifetime optimality is pre- 
ceded by a weaker variant that advances level by level within the universe 
of expressions starting with the minimal ones. In this process each transition 
between levels is optimised on its own, i. e. based on the assumption that de- 
cisions made for prior levels are fixed. As a key observation we found out that 
the central ideas of the levelwise approach could be also utilised to solve the 
full problem which, at first glance, seemed to be much more intricate. In fact, 
the central idea for this generalisation is a reduction of an arbitrary struc- 
tured universe of expressions to a two-level situation that actually suffices for 
the entire reasoning on register trade-offs. Chapter 5 closes the first part by 
investigating the impact of critical edges for both the single-expression view 
and the multiple-expression view. 

Part 2 which comprises Chapter 6 up to Chapter 8 deals with assignment 
motion and is organised similarly as the first part. In Chapter 6 we start by 
presenting the main applications of assignment motion, among which partial 
dead code elimination [KRS94b] and the uniform elimination of assignments 
and expressions [KRS94b] have a predominate role. Afterwards Chapter 7 
introduces a uniform framework for assignment motion based program trans- 
formations that is applicable in all relevant situations. Like in the first part 
the final chapter of the second part is devoted to the impact of critical edges. 

Chapter 9 provides a summary of the results, and investigates directions 
for future research. 




2. Basic Formalisms and Definitions 



This chapter introduces into the basic formalisms that are used throughout 
this monograph. 



2.1 Graph Theoretical Foundations 

Because graph theoretical concepts play an important role for various tasks, 
for instance for modelling programs or for the reasoning on the lifetimes of 
symbolic registers, we briefly sum up the most relevant basic notions and 
concepts. 



2.1.1 Undirected Graphs 

An undirected graph is a pair (V,E), where U is a set of vertices and E C 
^i, 2 (U) a set of edges} For an undirected graph G={V,E) and a vertex 
V €V the set of neighbours of v is defined by 

neighQ{v) *== {w | {u,w} G E} 

If the underlying graph G is understood from the context the index might be 
omitted. Moreover, neighQ is naturally extended to sets of vertices M CV. 

neigh q{M) neigh q{v) 

v&M 

An undirected graph (V, E) is called bipartite, if and only if there are two sets 
of vertices S and T such that V = S kt) T and every edge in E is incident to 
both a vertex in S and a vertex in T } For convenience, bipartite graphs will 
often be given in a notation {S l±) T, E) that already reflects the bipartition 
of the set of vertices. 



^ 111, 2 (F) denotes the set of one- and two-elementary subsets of the power set of 
V . Hence an edge of an undirected graph is a subset like {u, w} with v,w £V . 

^ W stands for set union of disjoint sets. 
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2.1.2 Directed Graphs 

A directed graph is a pair {V, E), where is a set of vertices and E CV xV 
is a set of directed edges. For a directed graph the set of neighbours of a vertex 
V is divided into two classes: the set of successors succci^fi) defined by 

succg{v) '== {w & V \ (v,w) G E} 

and the set of predecessors preda defined by 

preda{v) *== {w & V \ (w,v) G Ej 

By analogy to neighQ also succg and preda can be extended to sets of 
vertices. 

2. 1.2.1 Paths. A finite path p in a directed graph G= {V, E) is a sequence 
of vertices {v\, . . . , Vk) {k ^ 0), where 



V 1 ^ z < A:. Vi+i G succ{vi) 

{ ) is the empty path and we use \p\ to refer to the length of path p, which is 
fc for p = (fi, . . . , ffc). The set of all finite paths of G is denoted by Pg and 
the set of finite paths leading from a vertex z; to a vertex w is denoted by 
Pg[w,zc]. Furthermore, we define 

-Pg[«,zc["= U PGb,ic'] 

w' ^ pred{w) 

-Pg]«,H= U PgK.H 

v' ^succ{y) 

In situations where G is understood from the context the index G may be 
omitted. 

For a given path p G Pg and an index 1 ^ z ^ |p| the z-th component of p is 
addressed by pi. A path q G Pg is said to be a subpath of p, in symbols E p, 
if there is an index 1 ^ z ^ |p| such that z + jgl — 1 ^ |p| and qj =pi+j_i for 
all 1 ^ j ^ |g|. 

A path p is a cycle if pi =P\p\- A directed graph G for which Pg does not 
contain any cycle is called acyclic or for short a DAG (Directed Acyclic 
Graph) . 



2.2 Programs and Flow Graphs 

2.2.1 Expressions 

We consider expressions S that are inductively built from variables, con- 
stants and operators. The set of immediate subexpressions of an expression p 
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is denoted by SubExpr{ip). Moreover, this notion can be extended straight- 
forwardly to sets of expressions. Then the set of mediate subexpressions of an 
expression ip is given by 

SubExpr*{p) SubExpE{p), 

i&i 

where SubExpE{p) is inductively defined by SubExpr^(p) *== {p} and 
SubExpE'^^ {p) '== SubExpE {SubExpr{p)). For instance, 

SubExpr* {{a + h) * {c + d)) = {(a -I- 6) * (c -I- d), a -I- &, c -I- 6, a, 6, c, d} 
Complementary, the notion of superexpressions is defined by 

SupExpr(p) £ £ \ p £ SubExpr{tp)} . 

Mediate superexpressions SupExpr* (p) of p are defined analogously to me- 
diate subexpressions SubExpr* (p). 

For a set of expressions <P we use SubExpr^{^), SubExpr%{'i!i) , SupExpr^{ip) 
and SupExpr’^ipp) as shorthands for SubExpripp) n , SubExpr* {ip) n <P, 
SupExpr{ip) n <P and SupExpr* {ip) n d>, respectively. 

Finally, the minimal and maximal expressions with respect to are defined 
by 

<p™n ^ I SubExpr^{p) = 0} and 

^max djf ^ I SupExpr ^{p) = 0}, 
respectively. 

2.2.2 Statements 

Expressions are used in order to construct statements. With respect to our 
application, code motion, statements are classified into three groups: 

— Assignment statements of the form v := p, where v denotes a program 
variable and p an expression. Assignment statements are responsible for 
the transfer of values. 

— Immobile statements that are assumed to be the unmovable determinants 
within the program. This, however, only excludes the movement of the 
statement itself, while its expressions may be subjected to expression mo- 
tion. For the sake of presentation we consider two types immobile of state- 
ments. Output statements of the form out(p) are immobile, since they 
influence the observable behaviour of the program. Conditional branches 
of the form cond(p) are immobile, since we assume that the branching 
structure of the program is preserved.^ 

® In practice, also some assignment statements, for instance assignments to global 
variables (variables out of the scope of the flow graph under consideration), 
should be treated as immobile statements for safety reasons. 
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— Irrelevant statements, i.e. statements that are irrelevant for code motion. 
For the sake of simplicity we will use the empty statement skip as the only 
representative. 

2.2.3 Flow Graphs 

As it is common imperative programs are represented in terms of directed 
flow graphs G={N,E,s,e), where (N,E) is a directed graph and s G N 
and e G N denote the unique start node and end node of G. The nodes 
n G N represent statements and the directed edges (m, n) G E represent 
the nondeterministic branching structure of G. s and e are both assumed 
to represent the empty statement and not to possess any predecessors and 
successors, respectively. Moreover, every node n G N is assumed to lie on a 
path in P[s, e].^ Finally, we define the program size |G| which is important 
for reasoning about complexity issues by |G| \N\ + \E\. Note that for real 

life programs it is reasonable to assume that the edges of a flow graph are 
somehow sparse,® i. e. G( I A| ) = 0{\E\). 

Basic Blocks Considering nodes that represent basic blocks [Hec77], i.e. 
maximal linear sequences of statements, reveals an alternative view of the 
flow graph. Basic block nodes are more appropriate from a practical point 
of view, while nodes with elementary statements are more appealing for the 
theoretical reasoning. Moreover, basic blocks are more suitable for assign- 
ment motions that are applied repeatedly, since as opposed to the position 
of statements within the program which may change significantly the general 
branching structure in terms of the basic block flow graph is preserved. To 
distinguish basic blocks from elementary nodes we will use bold-faced sym- 
bols like n, m, . . . for them. Finally, the terms first{n) and last{n) refer to 
the first and the last instruction associated with a basic block n.® 

2.2.4 Program Points 

For a code motion under consideration relevant program properties refer to 
the entries and exits of the nodes of A. To this end we introduce the set 
of program points N which contains all entries and exits of nodes in N. 
The usage of program points significantly simplifies our reasoning, because 
program properties can be specified in a uniform way and we do not have to 

^ This assumption is fairly standard in data flow analysis. On the one hand, nodes 
that are not reachable from s can always be eliminated. On the other hand, 
nodes that do not reach e can be directly connected to e resulting in a program 
where each path originating in s is made terminating. 

® This condition can always be forced by assuming that the maximum out-degree 
of branching nodes is bound by a constant. 

® An empty basic block is at least considered to be associated with the empty 
statement skip. 
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cope with an artificial distinction between entry- and exit- situations. In order 
to distinguish program points in N from nodes in N we follow the convention 
to use dotted symbols like fi,m, . . . G N for the former ones. In particular, the 
entry of s and the exit of e are denoted by s and e, respectively. Regarding 
the exit point of a node as the immediate successor of its entry point makes 
the program points N a directed graph, too, which implies that notions for 
pred, succ and for paths apply as well. Moreover, from a conceptual point of 
view the statement of a node will be considered to be attached to the entry 
point, while the exit point is considered associated with the empty statement 
skip. 




entry 



H exit 



Fig. 2.1. Granularity of a flow graph: 
nodes in form of basic blocks, elementary 
statements or program points 



2.2.5 Code Patterns 

Code patterns refer to the syntactical shape of a piece of code in a flow 
graph G. Important for code motion are code patterns that can be sub- 
jected to a movement. In this monograph code patterns of particular inter- 
est are expression patterns SV{G) and assignment patterns AV{G).'^ Typ- 
ically, ip,ip, . . . are used to range over expression patterns and a, /3, 7 , . . . 
to range over assignment patterns. For an assignment pattern a G AV{G) 
the set of a- occurrences in G is denoted by OcCa{G). Typically, symbols like 
occa, occp, occj, . . . are used to range over elements of OcCa(G), where the in- 
dex refers to the corresponding assignment pattern. In some situation we are 
explicitly addressing path occurrences of an assignment pattern. In contrast 
to program occurrences such occurrences refer to an occurrence at a partic- 
ular position on a path. For a given path p we use ocCa,p, occp^p, occ^^p, . . . 
for path occurrences of a, /3, 7 , . . . , respectively, and denote the set of p-path 
occurrences of a in G by OcCa,p{G). Finally, a given path occurrence ocCa,p 
is projected to its corresponding program occurrence ocCa by omitting the 
path parameter. 



^ Some of the results can also be adapted to more complex code patterns as dis- 
cussed in [GLZ86]. 
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2.2.6 Critical Edges 

It is well-known that in completely arbitrary graph structures the code motion 
process may be blocked by critical edges, i. e. by edges leading from nodes 
with more than one successor to nodes with more than one predecessor (cf. 
[Dha88, Dha91, DS88, RWZ88, SKR90, SKR91]). 

In Figure 2.2(a) the computation of a -I- 5 at node 3 is partially redundant 
with respect to the computation of a -I- 5 at node 1. However, this partial 
redundancy cannot safely be eliminated by moving the computation of a -I- 6 
to its preceding nodes, because this may introduce a new computation on a 
path leaving node 2 on the right branch. On the other hand, it can safely be 
eliminated after inserting a synthetic node 82,3 on the critical edge (2,3), as 
illustrated in Figure 2.2(b). 




Fig. 2.2. Critical edges 



We distinguish between flow graphs possibly with or definitely without critical 
edges. Accordingly, these two classes of flow graphs are denoted as S'®crit 
and respectively. Throughout this book we will usually consider flow 
graphs without critical edges. This is for the reason that every flow graph 
G with critical edges can be transformed into a corresponding flow graph 
without critical edges . This procedure that is known as edge splitting is 
accomplished by inserting a synthetic node being associated with the empty 
statement on every critical edge. 

Nonetheless, we will carefully examine the impact of critical edges for most of 
the presented results. Besides the well-known drawbacks we also found strong 
new arguments giving evidence for the importance of edge splitting. 

Fundamental for the differences between flow graphs with and without critical 
edges is the following structural property that exclusively applies to flow 
graphs in 3"0: 

Lemma 2.2.1 (Control Flow Lemma). 

1. \/nG N. \pred{n)\ ^ 2 succ{pred{n)) = {n} 

2. y n G N. I sMcc(n) I ^ 2 pred{succ{n)) = {n} 



Proof. For the proof of the first property consider a predecessor m of a node 
n with \pred{n)\ ^ 2. Suppose |smcc(to)| ^ 2 then {m,n) is a critical edge in 
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contradiction to the assumption. Hence succ{pred{n)) = {n} as desired. The 
second property is symmetric. □ 



2.3 Program Transformations 

2.3.1 Admissibility 

Considering an optimisation goal of interest a universe of admissible pro- 
gram transformations is usually determined by means of a general syntactic 
scheme for the transformations and additional constraints capturing semantic 
properties that have to be preserved. 

Program transformations considered here do not modify the branching struc- 
ture of a program.® Therefore, every program path p in a program G has an 
associated program path in the transformed program Gtr that is denoted by 
Pm- 

2.3.2 Optimality 

The concern to prove a particular program transformation optimal within a 
universe of admissible transformations T requires to make a criterion explicit 
that is suitable to compare the “quality” of distinct program transformations 
with respect to some optimisation goals of interest. This comparison is ac- 
complished by means of a suitable relation ^ C T x T that in general 
will be a preorder, i.e. a relation that is transitive and reflexive. However, 
usually will not be antisymmetric, which is due to the fact that most opti- 
mality criteria are quite liberal allowing different programs of equal quality. 
Based on a transformation TR € T is then called ;i^-optimal, if and only if 
V TR' e T. TR' -< TR. 



Steffen [Ste96] and later Bodik, Gupta and Soffa [BGS98] presented approaches 
that expand the original program in order to eliminate partial redundancies com- 
pletely, however, at the price of a potential exponential blow-up of the program. 




Overview 



Expression Motion 

Expression motion is a technique for suppressing the computation of partially 
redundant expressions where possible, i. e. expressions that are unnecessarily 
reevaluated on some program paths at run-time. This is achieved by replac- 
ing the original computations of a program by temporary variables (registers) 
that are initialised correctly at suitable program points. A major advantage 
of expression motion is the fact that it uniformly covers loop invariant ex- 
pression motion and the elimination of redundant computations. 

In their seminal paper [MR79] Morel and Renvoise were the first who pro- 
posed an algorithm for expression motion being based upon data flow analysis 
techniques. Their algorithm triggered a number of variations and improve- 
ments mainly focusing on two drawbacks of Morel’s and Renvoise’s algo- 
rithm [Cho83, Dha83, Dha88, Dha89b, Dha91, DS88, JD82a, JD82b, Mor84, 
MR81, Sor89]). First, their algorithm was given in terms of bidirectional data 
flow analyses which are in general conceptually and computationally more 
complex than unidirectional ones, and second, expressions are unnecessarily 
moved, a fact which may increase register pressure. In [DRZ92] Dhamdhere, 
Rosen, and Zadeck showed that the original transformation of Morel and Ren- 
voise can be solved as easily as a unidirectional problem. However, they did 
not address the problem of unnecessary code motion. This problem was first 
tackled in [Cho83, Dha88, Dha91] and more recently in [DP93]. However, the 
first three proposals are of heuristic nature, i.e. code is unnecessarily moved 
or redundancies remain in the program, and the latter one is of limited ap- 
plicability: it requires the reducibility of the flow graph under consideration. 

In [KRS92] we developed an algorithm for lazy expression motion which 
evolved from a total redesign of Morel’s and Renvoise’s algorithm starting 
from a specification point of view. In fact, our algorithm was the first that 
succeeded in solving both deficiencies of Morel’s and Renvoise’s algorithm 
optimally: the algorithm is completely based on simple, purely unidirectional 
data flow analyses® and suppresses any unnecessary code motion.^® In fact, 
for a given expression this placing strategy minimises the lifetime range of the 
temporary that is associated with the expression: any other computationally 
optimal expression motion has to cover this lifetime range as well. 

Expression Motion of Multiple Expressions 

Typically, algorithms for expression motion are presented with respect to 
a fixed but arbitrary expression pattern. This is due to the fact that an 

® The idea for this was first proposed in [Ste91]. 

This algorithm was later interprocedurally generalised to programs with proce- 
dures, local and global variables and formal value parameters in [KS92, Kno93, 
KRS96b]. 



O. Ruething: Interacting Code Motion Transformations, LNCS 1539, pp. 19-20, 1998. 
© Springer- Verlag Berlin Heidelberg 1998 




20 



extension to multiple expression patterns is straightforward for sets of ex- 
pressions with a flat structure, i. e. sets of expressions that do not contain 
both expressions and their subexpressions. In this case a simultaneous algo- 
rithm is essentially determined as the independent combination of all individ- 
ual transformations. Such a situation is for instance given when considering 
programs whose expressions are completely decomposed into three-address 
format. Even though such a decomposition severely weakens the power of 
expression motion, it is fairly standard since Morel’s and Renvoise’s seminal 
paper [MR79]. In fact, all relevant algorithms are based on this separation 
paradigm of Morel/Renvoise-like expression motion. Whereas giving up this 
paradigm does not severely influence considerations with respect to compu- 
tational optimality, it affects lifetime considerations. This observation did 
not yet enter the reasoning on the register pressure problem in expression 
motion papers. In fact, this does not surprise, as the problem requires one 
to cope with subtle trade-offs between the lifetimes of different temporary 
variables. Nonetheless, the problem can be tackled and solved efficiently: in 
Chapter 4 we present the first algorithm for lifetime optimal expression mo- 
tion that adequately copes with large expressions and their subexpressions 
simultaneously. 

Expression Motion and Critical Edges 

It is well-known that critical edges are the reason for various problems in 
expression motion. The major deficiencies are that critical edges may cause 
poor transformations due to the lack of suitable placement points and higher 
solution costs of the associated data flow analyses. In Chapter 5 we investigate 
the reasons for known and for new difficulties caused by the presence of 
critical edges. 



Conventions 

As in [KRS92] we consider flow graphs whose nodes are elementary state- 
ments rather than basic blocks. Following the lines of [KRS94a] we can eas- 
ily develop algorithms where the global data flow analyses operate on basic 
blocks whose instructions are only inspected once in a preprocess. More- 
over, we primarily investigate the global aspects of lifetimes, i. e. we abstract 
from lifetimes of temporaries that do not survive the boundaries of a node in 
the flow graph. However, local aspects of lifetimes can be completely captured 
within a postprocess of our algorithm that requires one additional analysis 
(cf. [KRS92, KRS94a]). The whole part refers to a fixed flow graph G which, 
however, in Chapter 3 and Chapter 4 is assumed to be out of and in 
Chapter 5 to be out of S'®crit- 

Note that such an adaption is only superior from a pragmatic point of view but 

does not reduce the asymptotic computational complexity. 




3. Optimal Expression Motion: The 
Single-Expression View 



In this chapter we investigate expression motion for an arbitrary but fixed 
expression pattern ip. In particular, we recall busy and lazy expression motion 
as introduced in [KRS94a] which are of major importance due to their role 
as the basic components for the multiple-expression approaches presented 
later. Throughout this chapter Figure 3.1 serves as a running example, as it 
is complex enough to illustrate the essential features of our approach.^ 




^ We left out synthetic nodes on the critical edges (1, 4), (4, 12), (7, 7) and (7, 11), 
since they are irrelevant for the expression motions presented. 
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3. Optimal Expression Motion: The Single- Expression View 



3.1 Expression Motion 

Following [KRS92, KRS94a] we introduce: 

Definition 3.1.1 (Expression Motion). 

An expression motion is a program transformation that 

— inserts some instances of initialisation statements h.^p := at program 
points, where h.^p is an temporary variable (or symbolic register ) that is 
exclusively assigned to the expression pattern ip and 

— replaces some original occurrences of ip by a usage ofh,^. 

Hence an expression motion EM<^ is completely characterised by two on pro- 
gram points in N: 

— EH,p-Insert , determining at which program points initialisations := p 
are inserted and 

— En,p-Replace , specifying those program points where an original occurrence 
of the expression pattern p is replaced by h<^. 

Local Predicates For each node n G N two local are defined indicating 
whether p is computed or modified by n, respectively. 

— Comp^: p occurs as an original computation within n. 

— Transp^: p is transparent at n, i. e. the left-hand side variable of n is not 
an operand of p. 

The two local are extended to program points as well. To this end we define 
Compf^ Comp^ for the entry point h of a node n and Comp^ '*= false for 
any exit point h. Similarly, Transp^, = Transp^ for the entry point h of a 
node n and Transpf^ true for any exit point h. 

The local are the propositions used for specifying global program proper- 
ties of interest. This is usually done by means of path formulas in predicate 
logics.^ Therefore, we assume that the priority pr of operators and quanti- 
fiers^ is ordered by 

pr{—) > pr{/\) > pr{y) > pr{^) = pr{yp) > prif/) = pr(3) = pr(3) 

3.1.1 Safety and Correctness 

In order to guarantee that the semantics of the argument program is pre- 
served, we require that the expression motion must be admissible. Intuitively, 
this means that every insertion of a computation is safe, i. e. on no program 

^ Steffen [Ste91, Ste93] considers similar specifications in terms of modal logic 
formulas that can be automatically transformed into corresponding analysers. 

® 3 stands for definite existential quantification. 
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path the computation of a new value is introduced at initialisation sites, and 
that every substitution of an original occurrence of (p by is correct, i.e. 

always represents the same value as ip at use sites. This requires that 
is properly initialised on every program path leading to some use site in a 
way such that no modification occurs afterwards.^ For a formal definition we 
introduce two capturing if a potential insertion of an initialisation statement 
h,p := p at program point is safe or if a potential replacement of an original 
occurrence of at a program point is correct, respectively.® The correctness 
predicate, however, is relative to an expression motion under consideration. 

Definition 3.1.2 (Safety & Correctness). 

Let EK,p be an expression motion and h G N . Then 

1. Safef^ @ Vp G P[s, e] Vi ^ |p|. Pj = h 

3j < i. Compp^ AVj ^ k < i. Transpp,^ V 

' V " 

i) 

3j ^ z. Compp. A\f i ^ k < j. Transpp^ 

' V " 

ii) 

2. Correct^ S Vp G P[s, h] 3i ^ |p|. EM,p-Insertp_ A 

Vi ^ j < |p|. Transpp. 

Restricting the definition of safety only to the term marked (i) or (ii) induces 
for up-safety and down-safety, respectively, which are denoted UpSafe and 
DnSafe . 

Based upon Definition 3.1.2 we now formally define the notion of admissibil- 
ity. 

Definition 3.1.3 (Admissible Expression Motion). An expression mo- 
tion EK,p is admissible, if and only if it satisfies the following two conditions: 

1. Insertions of assignments := p are restricted to safe program points, 
i. e. 

yfiGN. EK,p-Insertf^ SafOf, 

2. Original occurrences of p are only replaced at program points where their 
replacement is correct: 

y h G N . EV[,p- Replace^ - Correct^,^ 



^ As usual we are exclusively dealing with syntactic expression motion here, where 
every left-hand side occurrence of an operand of p is assumed to change the value 
of p. For an expression motion algorithm that also captures semantic properties 
see [SKR90, SKR91, KRS98]. 

® Recall that every node lies on a path from s to e. 
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The set of all admissible expression motions with respect to ld is denoted by 

A£M^. 

Let us now take a look at some important properties of safety and correctness. 
The first one is that safety can perfectly be decomposed into up-safety and 
down-safety, a fact which is important for the actual computation of this 
property (cf. Section 3.5). 

Lemma 3.1.1 (Safety Lemma). V h G iV. Safe^^ UpSafe^ V DnSafe^^ 

Moreover, for an admissible expression motion correctness also ensures safety, 
as we can exploit the implication of Definition 3. 1.3(1) for the path charac- 
terisation of correctness. 

Lemma 3.1.2 (Correctness Lemma). 

VEM^ G A£M.^p, h G N. EViip- Correct^ 

The formal proofs of Lemma 3.1.1 as well as of Lemma 3.1.2 can be found in 
[KRS94a]. 

3.2 Busy Expression Motion 

In this section we recall busy expression motion as introduced in [KRS92, 
KRS94a]. This transformation is of particular importance, since it provides 
the key for the characterisation of all other computationally optimal expres- 
sion motions. 

3.2.1 Computational Optimality 

The primary goal of expression motion is to minimise the number of com- 
putations on every program path. This intent is reflected by the following 
relation. An expression motion G A£A4cp is computationally better^ 
than an expression motion EM(^ G AEM^p, in symbols EM(^ iitxp 
only if 

Vp G P[s, e]. Comp# (p, EM,^) ^ Comp# (p, EM),), 

where Comp# (p, EM,^) denotes the number of computations of (p that occur 
on the path p G P[s, e] after applying the expression motion EM,^, i.e. 

Comp# (p, EM,^) =* |{t I EM,^-/nsertp.}| -|- |{z | Compp. A^EM^-ReplacCp^jl 

Obviously, defines a preorder on A£A4^. Based on this preorder we now 
define: 

® Note that this relation is reflexive. In fact, computationally at least as good would 
be the more precise but uglier term. 
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Definition 3.2.1 (Computational Optimality). An admissible expres- 
sion motion EK^p € A£A4^, is computationally optimal, if and only if it is 
computationally better than any other admissible expression motion. 

Let us denote the set of computationally optimal expression motions with 
respect to (p by COSMip. 

3.2.2 The Transformation 

Busy expression motion is characterised by introducing its insertions at those 
program points that are safe and where an “earlier” computation of p would 
not be safe. Thus earliest program points can be considered as the upper 
borderline of the region of safe program points. 

Definition 3.2.2 (Earliestness). For every h G N 

Earliest '== Safe^^ A ((h = s) V 3 m G pred{h). ^Transp^ V ^Safe^) 
(★) 

Formally, busy expression motion (BEM,^) is then defined as follows: 

— Insert initialisation statements := p at every program point h 
satisfying Earliest. 

— Replace every original occurrence of p hy h.^,. 



Remark 3.2.1. As shown in [KRS94a] Definition 3.2.2 can be strengthened 
without loss of generality by using DnSafCf^ instead of (★) and universal 
instead of existential quantification. 

As a result we have (cf. [KRS94a]): 

Theorem 3.2.1 (Optimality Theorem for BEM,^). 

BEM,^ is computationally optimal, i. e. BEM,^ € COEM.p. 

Figure 3.2 presents busy expression motion for our running example of Figure 
3.1.^ In particular, the algorithm eliminates the loop invariant computation 
of a + 6 at node 5 and succeeds in removing the partially redundant recom- 
putation of a -I- 6 at node 9. 

Finally, for BEM,^ an interesting correspondence between safety and correct- 
ness can be established which complements Lemma 3.1.2. 

Lemma 3.2.1 (BEM,^-Correctness Lemma). 

yh G N. BEKp,- Correct^ AA Safe^^ 

Whereas the forward-implication is due to Lemma 3.1.2, Lemma 3.11(1) of 
[KRS94a] proves an alternative formulation of the backward-direction. 

^ The index a -I- b of the temporary variable h is omitted in the interest of brevity. 
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Fig. 3.2. BEM„+6 



3.3 Lazy Expression Motion 

The busy approach to expression motion of the previous section minimises 
the number of computations of (p at run-time, however, it does not take into 
account the lifetime ranges of the temporary variable h,^. Since unnecessarily 
large lifetime ranges of temporaries may cause superfluous register pressure, 
we are interested in a computationally optimal expression motion whose reg- 
ister usage is as economical as possible. 

Thus we need to provide a reasonable definition for the lifetime range that 
is associated with under a particular expression motion. Since we will 
separate the global aspects of lifetimes from the local ones an alternative, yet 
simpler definition compared to the one in [KRS94a] is sufficient. 



3.3.1 Lifetime Optimality 

Let us first introduce the set of paths leading from an insertion point to 
a replacement site, which shall be called insertion-replacement paths. For 
EM^ G A£M,p we have® 



Of course, the path is built of program points. 








3.3 Lazy Expression Motion 



27 



IRP(EM<^) = {p G P I IpI ^ 2 A EM<^-/nsertp^ A EK^p-Replace^^ ^ A 

' ' ' ^ 

A a) 

VI <i ^ \p\. ^EM^-/nsertp.} 

' V ^ 

Hi) 

In the above characterisation term (i) describes an initialisation at program 
point pi, term (ii) addresses a replacement at p\p\, while term (iii) ensures 
that the initialisation according to (i) is actually one that belongs to the 
replacement according to (ii). Then the lifetime range with respect to an 
expression motion EM^ G AEM^p defines the set of program points where the 
value of (f> is kept in the temporary and cannot be released: 

Definition 3.3.1 (Lifetime Range). 

LtRg{m^)^= IJ {pi I 1 ^ z < |p|} 

p e IRP(EM.^) 

Based upon this definition we obtain a notion of lifetime optimality. 

Definition 3.3.2 (Lifetime Optimality). 

1. An expression motion EM<^ G COSJVi^ is lifetime better than an ex- 
pression motion EM(, G COSAAip, in symbols EM(, if and only 

if 

LtRg{E%) C LtRg{EW^) 

2. An expression motion EM<^ G COSAd^p is lifetime optimal, if and only if 

VEM; G C0£M^. EM; EM^ 

It should be noted that is a preorder like However, we can easily see 

that lifetime optimal expression motions may only differ in insertions that 
are solely used immediately afterwards. 

Figure 3.3 shows the lifetime range of busy expression motion for our running 
example.® 

3.3.2 The Transformation 

Whereas BEM,^ realizes an as early as possible strategy for the movement of ex- 
pressions, lazy expression motion aims at placing computations as late as pos- 
sible, but as early as necessary (in order to yield computational optimality). 
Technically, this is accomplished by delaying the BEM,p-initialisations on every 
program path reaching e as long as no redundancies are reestablished. There- 
fore, we introduce a predicate that captures, whether the BEM,^-insertions can 
be “delayed” to a program point. 

® For the sake of presentation lifetime ranges are displayed as continuous ranges 
including entries of use-sites. 
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Fig. 3.3. The lifetime range associated 
with BEMa+6 



Definition 3.3.3 (Delayability). For any h G N 

Delayed^ @ Vp G P[s, fi] 3 1 |p|. Earliest^. A Vz j < |p|. -^Comp^. 

Similar to earliest computation points, which were considered upper borders 
of the region of safe program points, latest computation points are defined 
as the lower borders of the region of “delayable” program points. 

Definition 3.3.4 (Latestness). For any h G N 

Latestf^ ^ Delayed^ A {Compf^ V 3 m G succ{n). ^ Delayed^) 

Then lazy expression motion (LEM^) is formally defined as follows: 

— Insert initialisation statements := ip at every program point h 
satisfying Latest . 

— Replace every original occurrence of p by 



3.3 Lazy Expression Motion 



29 



As a result we have (cf. [KRS94a] ) : 

Theorem 3.3.1 (Optimality Theorem for LEM;^). 

1. LEM^ is computationally optimal, i. e. LEM^ € CO£Ai,p. 

2. LEM^ is lifetime optimal. 

Figure 3.4 shows the program that results from applying LEMa+f, to our run- 
ning example of Figure 3.1. In fact, we can see that the corresponding lifetime 
range is now significantly smaller than the one of BEMa+b (see Figure 3.3). 




Fig. 3.4. LEMa+6 and the corresponding 
lifetime range 



In [KRS94a] LEM,^ as presented here is called almost lazy code motion. This is 
due to the fact that the notion of lifetime ranges used there also incorporates 
purely local parts where initialisations are only used at the same program 
point immediately afterwards. In [KRS92, KRS94a] such initialisations are 
called isolated and are identified by means of an additional analysis. As men- 
tioned before, in this monograph we abstract from local aspects of lifetime 
ranges for the sake of presentation. This, however, does not mean any re- 
striction, since the local aspects can independently be treated by means of 
a postprocess. For instance, in Figure 3.4 we could eliminate the isolated 
initialisations at node 3 and 11 by reestablishing the original assignments 
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X := a + b. In contrast, the initialisation at node 8 cannot be eliminated, 
since h is also used at node 9. 



3.4 A Classification of COSM.^ 

Busy and lazy expression motion are the two extremal computationally opti- 
mal strategies in expression motion, where delayability provides the interface 
between them. In fact, delayability is the key for a general characterisation of 
expression motions in COS^A^p. This section provides some useful properties 
stressing this connection. As a first result we have: 

Lemma 3.4.1 (Delayability Lemma). 

1. \/n € N. Delayedf^ DnSafe^^ 

2. 'i h ^ N .'SSA^p- Correct^ ^ Delayedf^ V LEM,^ - Correct^ 

Proof. Part (1) can be found as Lemma 3.16(1) in [KRS94a]. For the proof 
of the second part let us assume 

^Delayedf^ (3-1) 

Then it is left to show that LEM,^- Correct^,^ holds. The premise Correct^ 
implies by definition 

V p € P[s, h] 3 i ^ |p|. Earliestp. A Vf ^ j < |p|. Transpp. 

As Definition 3.3.3 ensures that earliestness implies delayability the above 
characterisation particularly yields 

Vp G P[s, h] 3i ^ |p|. Delayedp. A Vz ^ j < |p|. Transpp. (3.2) 

Let us consider a path p G P[s, h] that meets the condition of Property (3.2). 
Then due to Assumption(3.1) there must be a largest index i ^ j < |p| 
satisfying Delayedp.. Since this also implies Latestp. Property (3.2) delivers 

Vp G P[s, h] 3 j ^ |p|. Latestp. A y j < |p|. Transpp^ 

which means LEM,^- Correct . as desired. □ 

The following lemma provides the proposed strong characterisation of an arbi- 
trary computationally optimal expression motion EM,^ in terms of delayability. 
While the first part grants that EM,^-insertions are always within intervals of 
delayable program points, the second part ensures that there is exactly one 
insertion inside of such an interval. 




3.5 Computing BEM^^ and LEM,^ 
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Lemma 3.4.2 (Computational Optimality Lemma). 

Let EM.ip G COEM.^. Then we have 

1. Wh G N. EMip-Insert^ Delayedf^ 

2. If p is an interval of program points between the earliest and latest pro- 
gram point, i. e. p is a path with Earliestj,^ , Latestp^^^ and Delayedp^ A 
-^Latestp^ for any i ^ k < \p\, then p contains exactly one computation 
of (p after an application of EM <^ : 

3 1 ^ ^ \p\. EH,p-Insertp V {Compp. A -^EE^p-ReplacCp ) 

Proof. Both parts are proved in [KRS94a], where they can be found in a 
slightly different formulation as Lemma 3.16(3) and Lemma 3.12(3). □ 

Remark 3.4-1- The condition in Lemma 3. 4. 2(2) can even be strengthened 
towards 

3 1 ^ ^ \p\. (En^-Insertp V {Compp. A ~^En,p- ReplacCp )) A 

Vt ^ fc ^ |p|. EM(^- Correcip^ 

This is obvious for k = i. Otherwise, we shall proceed by induction exploiting 
that no further insertions are allowed at program points pk with k > i. 

Remark (3.4.1) ensures that EM,^- Correcf^^^^ holds for any h being latest. Hence 
we particularly have: 

Corollary 3.4.1 (Computational Optimality Corollary). 

VEM,^ G CO£Ai,p, h G N . LEKp- Correct^ EK,p- Correct^ 

Figure 3.5 shows the range of program points sharing the delayability prop- 
erty for our running example. According to Lemma 3. 4. 2(1) this predicate 
also fixes the range where any computationally optimal expression motion 
is restricted to in its choice of initialisation points. Note, however, that not 
every program point in this range is actually an insertion point of a computa- 
tionally optimal expression motion. For instance, an insertion at node 7 or at 
the synthetic node srp would establish a loop invariant being absent in the 
original program. In contrast, the synthetic node sy^n is a possible insertion 
point of a computationally optimal expression motion that is neither earliest 
nor latest. 



3.5 Computing BEM^ and LEM^ 

In this section we will briefly present the algorithms that are applied in order 
to actually compute busy and lazy expression motion. Both algorithms rely 
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Fig. 3.5. Delayability as the 
range for valid insertions within 

A£Ma + b 



on the solution of appropriate unidirectional data flow analyses. The Boolean 
equation systems for both transformations are given in Table 3.1 and Table 
3.2, respectively. 

The equations are formulated in terms of a standard notation (cf. [MR79]), 
where and “overlining” denote disjunction, conjunction and nega- 

tion, respectively. Predicates refer to nodes of the flow graph and are divided 
into an entry- and exit-predicate, which is indicated by a preceding JV and 
X, respectively. To distinguish involved in the fixed point iteration from the 
specified solutions the former ones are emphasised in a bold faced style. A 
proof of the coincidence of the computed solutions with their corresponding 
specified definitions is omitted, since it is straightforward along the lines of 
[KRS94a]. 



3.5 Computing BEM^^ and LEM,^ 
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1. Safety Analyses: 

a) Down-Safety Analysis 



NDNSAFE„ = Comp^ + Transp„ ■ XDNSAFE„ 



XDNSAFE„ = 



false if n = e 

NDNSAFEm otherwise 

m^succ{n) 



Greatest fixed point solution: NDnSafe and XDnSafe 
b) Up-Safety Analysis 



NUPSAFE„ = 



false if n = s 

XUPSAFE„ otherwise 

rnGpred^n) 



XUPSAFE„ = Transp„ ■ {Comp„ + NUPSAFE„ 
Greatest fixed point solution: NUpSafe and XUpSafe 



2. Computation of Earliestness: (No data flow analysis!) 



NEarliest„ = NDnSafe^ ■ 

( true if n — s 



^ XUpSafe^ + XDnSafe^ otherwise 

m^pred{n) 



XEarliest„ = XDnSafe ■ Transp 

3. Insertion and Replacement Points o/BEM^^; 



BEM^-NInsert^ 


def 


NEarliestn 


BEn^-XInsert^ 


def 


XEarliesU 


BEVhp- Replace^ 


def 


Comp^ 



Table 3.1. Computing BEM,j, 
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1. Perform steps 1) and 2) of Table 3.1. 

2. Delayability Analysis: 

NDELAYED„ = NEarliest„ + 

{ false if n = s 

XDELAYED„ otherwise 

mGpred(n) 

XDELAYED„ = XEarliest„ + NDELAYED„ • Comp^ 

Greatest fixed point solution: NDelayed and XDelayed 

3. Computation of Latestness: (No dataflow analysis!) 

N-Latest.^ "= NDelayed^ ■ Comp^ 

X-Latest^ "= XDelayed^ ■ NDelayed^ 

S4tcc(n) 

4 . Insertion and Replacement Points o/LEM^^; 

LEM,^- A/nsert„ N-Latest„ 

llEA^-XInsert^ = X-Latest„ 

LEM,p-7?epZace„ = Comp„ 



Table 3.2. Computing LEM,^ 
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3.6 The Complexity 

As presented in the prior section both the busy and the lazy single-expression 
approach to expression motion are based on unidirectional data flow analy- 
ses on the Boolean lattice. It is well-known that such equation systems can 
be solved efficiently by using an iterative workset algorithm [Hec77] whose 
workset can be updated sparsely on demand, i.e. only elements that actu- 
ally changed their value are added to the workset. This is demonstrated in 
Algorithm 3.6.1 by using down-safety as a representative example. 

Algorithm 3.6.1. 

Input: An annotation of N with NDNSAFE and XDNSAFE being ini- 
tialised as follows: 

NDNSAFE„ =* true 



XDNSAFE„ = 



J false if n = e 
true otherwise 



Output: The maximal solution to the Equation System in Table 3.1(1). 

workset := N; 
while workset yf 0 do 
let n G workset ; 
workset := workset \ {n} ; 

NDNSAFE„ := Cornp^ + XDNSAFE Transp„; 
if ^NDNSAFE„ 
then 

forall m G pred{n) do 

if XDNSAFE^ 
then 

XDNSAFE,„ := false; 
workset := workset U {m} 

fi 

fi 

od 



Since the XDNSAFE -value of nodes added to the workset is immediately 
set to false, each node is added at most once to the workset when processing 
the while-loop. On the other hand, each element from the workset can be 
processed in time proportional to the number of incident nodes. Thus we 
have the following result: 

Theorem 3.6.1 (Complexity of BEM,^ and LEM,^). Both BEM,^ and LEM^ 
can he performed with run-time complexity of order 0{\G\). 




4. Optimal Expression Motion: The 
Multiple-Expression View 



In this chapter we demonstrate how the results of Chapter 3 can be profitably 
used for the development of algorithms that are able to cope with a set of 
program expressions simultaneously. The extension is almost straightforward 
if the universe of program expressions is flat, i. e. only contains expressions 
which are independent in terms of the subexpression relation. This, for in- 
stance, holds for intermediate level programs whose expressions are com- 
pletely decomposed into three-address format. However, decomposing large 
expressions comes at the price of weakening the potential for expression mo- 
tion, as such process introduces a bunch of new modification sites. ^ 

Nonetheless, the assumption of a fiat universe of expressions is fairly stan- 
dard in expression motion since Morel’s and Renvoise’s seminal paper [MR79]: 
almost all relevant algorithms are based on the “separation paradigm” of 
Morel/Renvoise-like expression motion. 

In this chapter we show how to go beyond this paradigm by examining ex- 
pression motion with respect to arbitrary (structured) expression sets. Sec- 
tion 4.1 starts with a short introduction to the standard situation under 
a fiat universe of expressions. Section 4.2 then introduces to the notion of 
expression motion for structured universes of expressions and proves compu- 
tational optimality for the structured variants of busy and lazy expression 
motion. Section 4.3 first provides an adequate notion of lifetime optimality 
incorporating trade-offs between the lifetime ranges of composite expressions 
and their subexpressions, and then presents the graph theoretical foundation 
for the algorithms presented in Section 4.4 and 4.5. The latter section also 
contains the main result of the book: an efficient algorithm for computation- 
ally and lifetime optimal expression motion working on arbitrary structured 
universes of expressions. 



4.1 Flat Expression Motion 

Expression motion of single expressions as investigated in the previous sec- 
tion can directly be applied for the simultaneous treatment of all program 

^ Fortunately, even in the case of decomposed expressions the ideas developed in 
this chapter are applicable using assignment motion (cf. Section 7.4). 



O. Ruething: Interacting Code Motion Transformations, LNCS 1539, pp. 37-112, 1998. 
© Springer- Verlag Berlin Heidelberg 1998 
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expressions from a flat universe of expressions i. e. a set of expressions 
such that 

y ip SubExpr%^^{ip) = {p} 



Definition 4.1.1 (Flat Expression Motion). Let {EM<^ | p G be a 

set of expression motions as introduced in Definition 3.1.1. 

1. A flat expression motion with respect to is a program transformation 
that 

a) inserts initialisations at program points that are determined by the 
insert predicates of the individual transformations, i. e. for any ip G 

and h G N 

-Insert^ ^ EVl^p-Insertf 

b) replaces the original occurrences of expression patterns that are de- 
termined by the replacement predicates of the individual transforma- 
tions, i. e. for any p G and h G N 

En,Pj.,^-Replace!f @ Replace f 

2. A flat expression motion is admissible, if and only if each compo- 

nent transformation EM<^ is admissible (in the sense of Definition 3.1.3). 

It should be particularly noted that multiple insertions at the same program 
point can be done in an arbitrary order. 

4.1.1 Flat Busy and Lazy Expression Motion 

The optimality criteria and of Section 3.2.1 and Section 3.3.1 can 
be naturally extended to yielding corresponding notions for and 

respectively: 

- EM^^^ n WpG 4>fi. EM^ EM^ 

- EM^,^ n ypG <?fi. EM^ EM^ 

Moreover, this also induces notions of computational and lifetime optimality, 
respectively. Denoting the flat versions of busy and lazy expression motion 
with respect to by BEM^>,j and LEM^.^,, respectively, we have according to 
Theorem 3.2.1 and Theorem 3.3.1: 

Theorem 4.1.1 (Flat Expression Motion Theorem). 

1. BEM^,j, is computationally optimal with respect to <Pfi. 

2. LEM^,j, is computationally and lifetime optimal with respect to <Pfi. 




4.2 Structured Expression Motion 
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4.2 Structured Expression Motion 

In this section we will go beyond the restrictive assumption on a flat uni- 
verse of program expressions, i. e. we are now faced with universes containing 
composite expressions as well as their subexpressions. As we will see, this 
does not seriously influence the reasoning on computational optimality. In 
fact, computationally optimal expression motions can still be obtained quite 
easily by means of the individual transformations of Chapter 3. On the other 
hand, the situation changes significantly with respect to lifetime considera- 
tions. This observation did not enter the reasoning on the register pressure 
problem in expression motion papers [Bri92a] . Throughout the remainder of 
this chapter let us fix a structured universe of expressions 'P, i. e. a set of 
expressions with 

V'0 G (f G SubExpr{il}). SubExpr%{(p) yf 0 ip G <P (4.1) 



This condition excludes gaps in the immediate subexpression relation on <P. In 
particular, it ensures that the minimal expressions in are flat. It should 
be noted that Assumption (4.1) does not impose any restriction with respect 
to the usual application situation. Nonetheless, we could even do without this 
assumption, however, at the price of a highly technical presentation. Finally, 
we introduce two further notions. The (relative) level of an expression pattern 
if G <P is defined by 



{ 0 if V' G 

. 

1 -I- max Lev^ipp) ii if G<P\ 

ip e SubExpr^{'tp) 



Moreover, let and refer to the expression patterns of <P with level 

i, levels lower or equal than i or levels strictly lower than i, respectively. 
Obviously, also and are structured sets of expressions, i.e. meet the 
condition of Assumption (4.1) with or in place of <P, respectively. 

For an expression if the maximal occurrences of subexpressions that are part 
of P are defined by 



MaxSubExpr ^{if) = 



{if} ii if G P 

IJ MaxSubExpr ^{ip) otherwise 

V? G SubExpr('ip) 



Remark 4- 2.1. Note that there might be both maximal and non-maximal 
occurrences of (p G MaxSubExpr ^{if) with respect to a given expression if . 
For instance, for i/' (a-l-6)*((a-|-6)-|-c) and ^ {a+h, (a -I- 6) -I- c} we have 

MaxSubExpr ^{if) = {a -I- 5, (a -I- 6) -I- c}. However, only the first occurrence of 
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a + b in tp is actually maximal as the second one is covered by the expression 
(a-|-6)-|-c. However, for a program expression under consideration the maximal 
occurrences can be marked as a by-product of the recursive definition. 



4.2.1 Defining Structured Expression Motion 

Essentially, expression motions with respect to <P are still composed out of the 
individual transformations as introduced in Chapter 3. However, as an addi- 
tional constraint, initialisations with respect to large expressions are required 
to be preceded by the initialisations belonging to their subexpressions, and 
replacements of large expressions at a program point must force the replace- 
ment of all their subexpressions. For both right-hand sides of initialisations 
as well as for replacements of original expressions we have to ensure that 
the largest subexpressions being available at the program point are actually 
replaced. More formally this reads as follows: 

Definition 4.2.1 (Structured Expression Motion). 

1. A structured expression motion EM,^ with respect to <P is a transformation 
composed out of a set of individual expression motion transformations 
{EM,p I (fi G <P} satisfying the following constraint: 

— y-ip G<P, h G N. EK^-Insertf V (p G SubExpr . EVl^-Correctf 

— y Ip G<P, h G N. EK^-Replacef V (p G SubExpr^{tp). EE^-Replacef 
This induces insertions and replacements of the following kind: 

a) The insertion points are determined by the insertion points of the 
individual transformations, i. e. for any jp G and h G N we have: 
— EV[,p-InsertfP @ Et\,p-Insertf 

Insertions wrt. ip G are built according to the following rules: 
i. Initialisation statements are of the form h,/, := ip' , where ip' 
results from ip by replacing every operand ip G SubExpr^{ip) by 
h,p . If there is an initialisation of at the same program point 
then it has to precede the initialisation of h,p . 
a. Except for the ordering constraint imposed in (l(a)i) multiple 
insertions at a program point can he ordered arbitrarily, 
h) Replacements o/ EM^> are restricted to maximal occurrences, i. e. for 
any tp G <P and h G N we have: 

— Et\,p- Replaced ^ p G MaxSuhExpr,p^^^.^{p^^), 
where addresses the right-hand side expression that belongs to 
h and <?Repi {p' G ^ | EV[,pi -Replacef }. For an expression p that 
satisfies EK,p-Replacey some original occurrences in (^srhs are replaced 
by (see Remark 4- 2.1 for details). 

2. A structured expression motion EM^, is admissible, if and only if every 
component transformation EM,^ is admissible (in the sense of Definition 
3.1.3). 




4.2 Structured Expression Motion 
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Figure 4.1 shows an admissible expression motion with respect to the set of 
expressions {{a + b) + c,a + b}. Let us denote the set of admissible structured 
expression motions with respect to by A£A4<p. 



a) 



V 

1 a+h I 

\ If 

2 1 n 



/X 




3 a+b 



b) 



V 



1_ [l"* 



h ;= a+h 
h a+b 

X 

2 I I 



I V»,„-=h„j+ci 3^1 



5 I h (a+b)+c 



Fig. 4.1. Admissible structured expression motion with respect to the universe of 
expression $ — {a + b, {a + b) + c} 



4.2.2 Computationally Optimal Structured Expression Motion 

The preorder on ASMcp can be extended to a preorder on A£M<i> 
in the same way as in Section 4.1: 

- EMi, G <P. EM^ EM^ 

This straightforward extension is feasible, since a structured expression mo- 
tion results exactly in the same number of (/^-computations on any path as 
its individual component for ip does. In fact, one can easily check that every 
(/^-initialisation of EM^, corresponds to a (/)-initialisation of EM^ and vice versa. 
Similarly, every original computation of tp that is not replaced by EM^> is not 
replaced by EM,^ and vice versa, ^f^p also induces a corresponding notion of 
computational optimality. Let us denote the set of computationally optimal 
expression motions among ASM<i> by COSM<i>. 

Although the definition of computational optimality is particularly simple, 
it is not immediately obvious that COEAi^ is not empty. Fortunately, we 
can easily obtain a computationally optimal structured expression motion 
either by using busy expression motion or using lazy expression motion as 
individual components. This was also noticed by Drechsler and Stadel [DS93] 
who presented an adaption of our algorithm for lazy expression motion that 
deals with multiple expressions. Unfortunately, however, their presentation 
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suggests that lifetime optimality carries over from the individual transfor- 
mations, too. This, however, is based on a purely technical view of lifetime 
ranges that is not adequate. In section 4.2.3 we will discuss a reasonable no- 
tion of lifetime ranges, and present an algorithm that adequately deals with 
the lifetimes of multiple temporaries simultaneously. Before, however, let us 
first investigate the easier task of obtaining computational optimality and 
present the structured variants for both busy and lazy expression motion, 
respectively. 

4. 2. 2.1 Structured Busy Expression Motion. All properties on the 
interdependencies between predicate values of large expressions and their 
subexpressions are based on the following trivial relationship among the lo- 
cal properties: 

Lemma 4.2.1 (Structured Local Property Lemma). 

1. V^/> G G SuhExpr^{ip), fi G N. Transp^ Transp'f 

2. V^/> G (/J G SubExpr^{tp), fi G N. Comp^ Comp^ 

A central property for combining the individual busy expression motion trans- 
formations in a structured way is that subexpressions are essentially more 
mobile with respect to hoistings than their superexpressions. In terms of the 
safety predicates this reads as: 

Lemma 4.2.2 (Structured Safety Lemma). 

1. y ‘tp G G SubExpr^{ip), h G N. UpSafef UpSafe^ 

2. y ip G ‘P,ip G SubExpr^{ip), h G N. DnSafef DnSafe'f 

Proof. The lemma trivially follows from the definition of up- and down- 
safety (cf. Definition 3.1.2) and the Lemma 4.2.1. □ 

For the upper borderline of the region of safe program points, the earliest 
program points, we additionally have: 

Lemma 4.2.3 (Structured Earliestness Lemma). 

y tp G G SubExpr ^{ip) , fi G N. Earliest"^ BEM^,- Correct^ 

Proof. We have 

Earliestf 

[Definition Earliest] Safe'^ 

[Lemma 3.1.1 & 4.2.2] Safe^ 

[Lemma 3.2.1] ^ BEM^,- Correct,^ □ 
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Lemma 4.2.3 is sufficient to draw the conclusion that the individual busy 
expression motion transformations induce a structured expression motion in 
the sense of Definition 4.2.1. Moreover, admissibility and computational opti- 
mality of the individual transformations trivially carry over to the structured 
variant yielding: 

Theorem 4.2.1 (Structured Busy Expression Motion Theorem). 

BEM^, is 

1. admissible, i. e. BEM<f e A£A4<p. 

2. computationally optimal with respect to <P, i. e. BEM,^ G COSAi^. 

4. 2. 2. 2 Structured Lazy Expression Motion. The structured version 
of busy expression motion presented in the previous section evolved as the 
natural combination of its individual components. A dual optimal structured 
expression motion is induced by the combination of lazy component transfor- 
mations. As mentioned the intuitive reason for the well-behavedness of BEM^, 
is the fact that large expressions are less mobile with respect to hoistings 
than their subexpressions. With regard to sinkings being based on the de- 
layability predicate the situation becomes contrary, as now large expressions 
are more mobile than their subexpressions. Intuitively, this is because original 
occurrences of large expressions are blockades for all their subexpressions. A 
more formal argument is given in the forthcoming Lemma 4.2.5. Before, we 
shall investigate a result about correctness predicates of structured expression 
motions. 

Lemma 4.2.4 (Structured Correctness Lemma). 

Let EM^, G A£A4<p, ip G d>, ip G SubExpr,p{ip) and h G N . Then 

-Correct f ^ EKp-Correct^f 

Proof. 

EM,/,- Correct f 
[Definition EH<p- Correct] 

Vp G P[s, h] 3 z ^ jpj. EM,p-Insertp. A Vz ^ j < jpj. Transp^. 
[Lemma 4. 2. 1(1)] 

Vp G P[s, h] 3 z ^ |p|. EM,p-Insertp. A Vz ^ j < |p|. Transpp. 
[Definition 4. 2. 1(1)] 

Vp G P[s, h] 3 z ^ |p|. EM^,- Correct^ A Vz ^ j < |p|. Transpp. 
[Definition EH<p- Correct] 

Vp G P[s, h] 3 j ^ |p|. EM,p-Insertp. A Vj ^ ^ < |p|. Transpp^ 
[Definition EH<p- Correct] 

=> EM<f- Correct^ □ 
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Based on Lemma 4.2.4 we obtain the complement to Lemma 4.2.2. 

Lemma 4.2.5 (Structured Delayability Lemma). 

Let G G SubExpr^{tp) and h £ N. Then 

1. Delayed^ Delayed!^ V LEM<|,- Correct^ 

2. Delayed'^ A DnSafef Delayed^ 

3. Delayed^ A ^Earliestf ^Earliest^ 

Proof. Considering part (1) let us assume ^LEM<|,- Correct^. Then 

^ ^ ^ 

(t) 

Delayed^ 

[Lemma 3. 4. 1(1)] DnSafef 

[Lemma 3.2.1] => BEM^- Correct^ 

[Lemma 4.2.4] ^ BEM^- Correct^’ 

[Assumption (|) & Lemma 3. 4. 1(2)] Delayedf 

For the proof of the second point we have 

DnSafef 

[Def. DnSafe & Earliest] 

Vp G P[s, h] 3i ^ |p|. Earliest^. A 

^ J < bl- Transpp. A DnSafOp. 

[(4.2)] Vp G P[s, h] ^ |p|. Earliestp. A Vi ^ j < |p|. -^Compp. 

[Def. Delayed] 

=> Delayed]^ 

There remains the proof of Implication (4.2). To this end let us assume a 
path p G P[s, h] and an index i ^ |p| such that 

Earliestp. A Vi ^ j < |p|. Transpp. A DnSafep. 

Suppose there is an index i ^ j < |p| such that Compp. holds. According 
to Lemma 4. 2. 1(2) this implies Compp^. Exploiting the other condition of 
the premise, namely that Delayed!^ holds, there must be an index I with 
j < Z ^ IpI satisfying Earliestp^. By the definition of earliestness (see Defini- 
tion 3.2.2 and Remark 3.2.1) this either implies ^Transpp^_.^ or ^Safep^_.^. Ac- 
cording to Lemma 4.2. 1(1) the first case is in contradiction to the assumption 
Transpp^_.^. Furthermore, according to Lemma 4.2.2 the second case would 
imply ~^Safep^_^, which is in contradiction to the assumption DnSafep^. 
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For the third part we have 

^Earliestf A Delayed^ 

[Definition Delayed & Latest ] 

ymGpred{fi). Delayed^ A ^Latest^ 

[Lemma 3. 4. 1(1) & Definition Latest] 

y m G pred{fi) . DnSafef^ A ^Comp^ 

[Lemma 4.2. 1(2) & 4.2.2(2)] 

y fn G pred{h).DnSafe!^ A ^Comp!^ 

[Lemma 3.2.1] 

y m G pred{fi). BEKp- Correct!^ A DnSafe^ A ^Comp!^ 
[Definition DnSafe] 

=> y rh G pred{fi). BEKp- Correct!^ A Transp!^ 

[bem^ € 

-^Earliest!^ □ 

Finally, as an application of Lemma 4.2.5 we get: 

Lemma 4.2.6 (Structured Latestness Lemma). 

y ^ G <P,if G SubExpr^{tp), h G N. Latest]^ ^ LEM,^ - Correct^ 



Proof. By definition Latest^ implies 

Delayed]^ A {Compf V 3m G succ{h). -^Ddayedff) 



We investigate both cases of the disjunction separately. First we have 



[Lemma 4. 2. 5(1)] 
[Lemma 4. 2. 1(2)] 
[Definition Latest] 
[Definition LEM<f- Correet ] 



Delayed^ A Comp^ 

{Delayed^ V LEM.^- Correct f”) A Comp'l 
{Delayed^ V LEM,^- Correct f”) A Compf 
Latest^ V LEM,^- Correct^ 

LEMa,- Correct f 



Otherwise, we may assume ^Compf and argue as follows: 

Delayed'!^ A 3m € succ{h). ^Delayed^ 
[Lemma 3.4. 1(1)] DnSafef A 3 m G succ{h). ^ Delayed^ 

[Definition DnSafe & Assumption ^Compf] 

3m G succ{h). DnSafe^ A ^Delayedf^ 
[Lemma 4. 2. 5(2)] 3m € succ{h). ^Delayed]^ 




46 



4. Optimal Expression Motion: The Multiple-Expression View 



As Latestf implies Delayed^, Lemma 4. 2. 5(1) yields either LEM^- Correct^ 
or Delayed^. In the first case there is nothing to show, whereas in the other 
case Property (★) immediately yields Latest^, which particularly implies 
ISM^-Correctf , too. □ 

Lemma 4.2.6 is the reason that the individual lazy expression motion trans- 
formations can be combined to a structured variant in the sense of Definition 
4.2.1. Moreover, as for BEM,^ admissibility and computational optimality carry 
over directly from the individual transformations. Hence we obtain: 

Theorem 4.2.2 (Structured Lazy Expression Motion Theorem). 

LEM^, is 

1. admissible, i. e. LEM,^ G A£A4<p 

2. computationally optimal with respect to <P, i. e. LEM,^ G COSAi^. 



4.2.3 Lifetime Optimal Structured Expression Motion 

Theorem 4.2.2 gives rise to the question, if LEM<|, also preserves lifetime opti- 
mality of its component transformations. However, unlike the preorder 
cannot be composed straightforwardly out of the preorders being associ- 
ated with the single expressions. Intuitively, the lack of compositionality is 
due to dependencies between lifetimes associated with large expressions and 
their subexpressions. The following section is devoted to this phenomenon. 

4. 2. 3.1 Trade-Offs between the Lifetimes of Temporaries. To illus- 
trate the problem let us consider the program in Figure 4.2.^ The point of 
this example is that the structured extension to lazy expression motion would 
suggest to initialise the large term a*6-|-c*cias well as its subterms a* b 
and c * d as late as possible. However, this is not a good choice in terms of 
the “overall” lifetimes of temporaries. Using this strategy two temporaries 
are necessary for keeping the values of the proper subterms a * b and c * d. 
It should be noted that both operand expressions have to be evaluated prior 
to their original occurrences at node 1 (see Figure 4.2(b)). A better choice 
in terms of the overall lifetimes would be to initialise the large expression as 
early as possible. This way only the value of the large expression has to be 
stored in a temporary along the path from node 1 to node 2 as it is illustrated 
in Figure 4.2(c). 

This example shows that we have to reconsider the notion of lifetimes of tem- 
poraries. The main difference in comparison to the flat approach is that now 
lifetimes not only may end at original computations, but may end at initiali- 
sation sites of a larger original expressions as well. In other words, the choice 
of a particular insertion point for a complex expression does not only affect 



For brevity we use hi, I 12 and hs instead of ha.b, hc*d and hatb+c*d, respectively. 
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a) h) c) 




the lifetime of the associated temporary but may as well influence the life- 
times of the temporaries of its subexpressions To come to a reasonable notion 
of lifetimes in the case of structured expression motion let us first reconsider 
the notion of insertion-replacement paths as given in Section 3.3.1. For the 
reasons explained above we replace this notion by a notion of insertion-use 
paths. 

IUP(EM<|., (^) {pGP I IpI ^ 2 A EM^-/nserf^^ A A 

i) ii) 

VI < * ^ \p\. Insert!^.}, 

' V ^ 

Hi) 

where the Used™^ predicate reflects replacements of original computations 
as well as usages at initialisation points of larger expression patterns: 

Used™^h <tA EV[^-Replace'f V 3%l) & SupExpr^{ip). EV[,i,-Insertf^ 

Like in Definition 3.3.1 the parts of the definition marked (i) and (ii) capture 
initialisations at node p\ and usages at node p|p|, respectively. The part indi- 
cated by (iii) establishes the correspondence between (i) and (ii). Moreover, 
the definition coincides with the former notion for maximal expressions in <P. 
Hence 

Vy, e IUP(EMs>,<p) = IRP(EMp,(/j). (4.3) 

The usage of insertion-use paths instead of insertion-replacement paths is 
the only difference in a modified notion of lifetime ranges. Hence for EM^ € 
A£M$ the structured lifetime range with respect to G ^ is defined by: 
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Definition 4.2.2 (Structured Lifetime Ranges). 

SLtRg{m<p,ip) =* y {pi\l < |p|} 

p G IUP(EM^,i/p) 



An important observation is that a program point being part of a structured 
lifetime range has a uniform history with respect to this lifetime range, i.e. 
the associated temporary is initialised on every program path leading to the 
program point. More formally, this reads as: 

Lemma 4.2.7 (Lifetime Range Lemma). Let EM^, G A£A4,p. Then 
W if G <P, h G N. h G SLtRg{EM^,ip) ^ EK^-Correct^ 

Proof. Let tp G SLtRg {E\i[,p ^ Lp). Then Definition 4.2.2 ensures that there is an 
insertion-use path p with |p| ^ 2 such that 

EV[^-Insertp.^ A Used™'^p\p\ A 1 < i ^ |p|. ^EM<j,-/nsert^ 

' V ^ V ^ ' V 

i) ii) Hi) 

and there is an index 1 ^ fc < |p| such that pk = h. Due to the initial- 
isation conditions of structured expression motion Proposition (ii) ensures 
EK^-Correctp^^^ . Using the absence of other insertions according to Condition 
(iii) we obviously have 

VI ^ z ^ |p|. EM^,- Correci^ , 

which particularly forces EKp-Correct!^. □ 

4. 2. 3. 2 Lifetime Optimality. Under the new notion of lifetime ranges it 
is obviously not possible to keep all lifetime ranges as small as possible. Our 
example in Figure 4.2 already showed that such an aim would indeed be too 
narrow, since the overall costs imposed by the usage of temporaries do not 
depend on the actual names^ of lifetime ranges including a program point 
but rather on their number, which leads to a new notion of the lifetime-better 
preorder capturing their cumulated effects. 

Definition 4.2.3 (Lifetime Optimality). 

1. An expression motion EM,^ G COEM.^ is lifetime better than an expres- 
sion motion EM^ G COEAi^, if and only if for any h G N 

\{ip G'T\h G SLtRg{En^,Lp)}\ ^ \{p G 'T \ h G SLtRg{En'^,ip)}\ 

As before, we will use the notion EM^ EM^, to refer to this situation. 

That means the names of their associated expressions. 
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2. An expression motion EM<|, G COSAi,p is lifetime optimal with respect to 
<!>, if and only if it is lifetime better than any other transformation in 
CO£Md>. 

It should be noted that Lemma 4.2.7 ensures that the cumulation of lifetime 
ranges is adequate to model the usage of temporaries at a program point, as 
the temporaries are uniformly initialised on every program path that reaches 
the program point. For this reason the number of occupied temporaries at a 
program point does not depend on the particular history when reaching the 
program point. 

As an immediate consequence of Definition 4.2.3 we have as a first result: 

Lemma 4.2.8 (Lifetime Optimality Lemma). Any lifetime optimal ex- 
pression motion EM^, G COSA4$ is lifetime better than both BEM^> and LEM,|,. 

4. 2. 3. 3 Inductive Lifetime Optimality. In addition to the previous no- 
tion we also consider an alternative notion of lifetime optimality that is based 
on the special role of lazy expression motion in the flat approach. Here the 
fundamental observation is that the minimal expression patterns of can- 
not participate in any profitable trade-offs with their subexpressions. Hence 
this suggests to determine their insertion points by means of the lazy expres- 
sion motion strategy. Using the minimal expression patterns as the seed for 
initialisations with respect to more complex expression patterns leads to an 
inductive notion of lifetime optimality: 

Definition 4.2.4 (Inductive Lifetime Optimality). 

An expression motion EM^, G COSA4,p is inductively lifetime optimal with 
respect to if and only if it satisfies the following inductive characterisation: 

1. EM,fo is lifetime optimal and 

2. Vz > 0, EM^ G COEM<s,. EM^<, = EM^,<i ^ EM^^, EM,p<i 

Essentially, the inductive notion differs from the full notion in its restricted 
field of view: it only aims at optimising the next local step from level z — 1 
towards level z. In fact, in general it leads to results that are strictly weaker 
than their fully lifetime optimal counterparts. On the other hand, a lifetime 
optimal program is not necessarily an inductively lifetime optimal one, too. 
Later in Section 4.5.2 we will discuss the limitations of inductive lifetime 
optimality in more details. Nonetheless, the reason that we are also consid- 
ering this notion are twofold: first, the restricted notion leads to situations 
where trade-offs between lifetime ranges are much more evident. In fact, our 
algorithm that reaches full lifetime optimality would probably not have been 
found without the previous work on the inductive case. A second point that 
speaks for the inductive notion is the fact that a solution can be computed 
more efficiently in terms of computational complexity than one that meets 
the general condition (cf. Section 4.6). 
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Inductively lifetime optimality is only tailored towards improving lazy ex- 
pression motion, while busy expression motion might do better by chance. 
Hence as a counterpart to Lemma 4.2.8 a simple induction proof shows: 

Lemma 4.2.9 (Inductive Lifetime Optimality Lemma). 

Any inductively lifetime optimal expression motion € COSAA^ is lifetime 
better than LEM<f, i. e. 

LEM^, <% EM^, 



4.3 A Graph Theoretical View on Trade-Offs between 
Lifetimes 

In this section we present the foundation for the development of our algo- 
rithms for lifetime optimal and inductively lifetime optimal expression mo- 
tion. Reduced to its abstract kernel this leads to a graph theoretical problem 
that can be solved efficiently by using well-known matching techniques for 
bipartite graphs. 



4.3.1 Motivation 

As a motivation let us consider a restricted situation where, at a given pro- 
gram point, we are only faced with two kinds of expressions: elementary ones 
and composite ones whose operands are elementary. Let us further assume 
that the insertion points of elementary expressions are already determined 
in advance. Based upon this situation we are left to identify those composite 
expressions whose premature initialisation at the program point pays off and 
those elementary expressions whose associated temporaries can be released. 
Hence this trade-off situation is modelled by means of the following two sets: 

— A set of lower trade-off candidates, i. e. a set of elementary expressions that 
are already initialised at the program point and that are used at most for 
the evaluation of large expressions that can be initialised there. 

— A set of upper trade-off candidates, i. e. a set of large expressions whose 
corresponding temporaries can either be initialised immediately at this 
program point or optionally be postponed to later program points. 

Essentially, the problem we are interested in is to find a subset of lower 
trade-off candidates whose set of neighbouring upper trade-off candidates 
is smaller in size. In this case it is profitable to terminate all the lifetime 
ranges associated with the chosen lower trade-off candidates at the price of 
starting lifetime ranges that are associated with the related upper trade-off 
candidates. 
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4.3.2 Tight Sets in Bipartite Graphs 

The situation above is naturally modelled by means of a bipartite graph 
(cf. Section 2.1.1) with edges between lower and upper trade-off candidates 
reflecting the subexpression relation. 

In the graph theoretical view the problem of finding a profitable trade-off 
between lower and upper trade-off candidates leads to the notion of tight 
sets [LP86], i. e. subsets of S that are covered by fewer vertices of T: 

Definition 4.3.1 (S'-Tight Sets). 

Let {S l±) T, E) be a bipartite graph and S' C S. 

1. The value of the difference IS"! — \neiqh(S')\ is called the 5'-deficiency, in 
symbols defics(S'), of S' . 

2. If deficg{S') ^ 0 then S' is called an S'-deficiency set. 

3. If S' is of maximum deficiency among all subsets of S, then it is called 
5-tight. 

For all notions the parameter S may be skipped, whenever S is understood 
from the context. Note that especially 0 is a deficiency set. 

4.3.3 An Efficient Decision Procedure 

Let us first focus on our problem as a pure decision problem. 

Problem 4.3.1 (Existence of a non-trivial S'-deficiency set). 
Instance: A bipartite graph (T l±l S,E). 

Question: Does S possess a S'-deficiency set S' yf 0? 

Obviously, this problem is decidable, as the number of subsets is finite. The 
point of interest rather is, whether it can be answered effectively. At first 
glance a solution is not at all obvious. In fact, the straightforward approach 
would be to enumerate all subsets of S and then to check if their deficiency 
is non-negative. Since such a check is clearly in P, one might suspect that 
the problem is yet another member of the large class of NP-complete prob- 
lems [GJ79] in the field of register allocation [CAC+Sl, Cha82, AJU77]. For- 
tunately, Problem 4.3.1 can be solved efficiently by employing some graph 
theoretical results on matchings in bipartite graphs. 

4. 3. 3.1 Matchings in Bipartite Graphs. 

Definition 4.3.2 (Matchings). Let {S l±l T,E) be a bipartite graph. 

I A subset of independent edges M C E is called a matching, i. e. for the 
edges of M we have: Vci, C2 € M. Ci fl 62 = 0 

2. A matching M is called a maximum (cardinality) matching, if and only 
if \M\ is maximal. 

3. A matching M is complete, if and only if \M\ = minus'!, |T|}. 
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Figure 4.3 gives examples of maximum and complete matchings. Obviously, 
every complete matching is maximum. The converse does not necessarily hold 
as Figure 4.3(b) shows. Note that here as in following examples bold lines are 
chosen in order to emphasise edges belonging to the matching. Considering 

a) T b) T 



Fig. 4.3. a) A com- 
plete matching, b) A 
maximum yet non- 
complete matching 

a matching M a vertex u G S' l±l T is is called M -matched, or simply matched 
if M is understood from the context, if v G e for some e G M. A path that 
alternately contains edges in M and edges not in M is called an alternating 
path. 

Obviously, Decision Problem 4.3.1 is trivial under the assumption that |S| ^ 
|T|, since we may simply choose S' S. Therefore, let us assume that 
|S| < \T\. In this case the solution to problem 4.3.1 is due to a result of 
Hall [Hal35]: 

Theorem 4.3.1 (P. Hall, 1935). Let {S l±l T, E) be a bipartite graph with 
[S'! ^ |T|. Then this graph has a complete matching, if and only if 

y S' CS. \neigh{S')\ ^ IS"]. 

Hall’s Theorem is commonly associated with the marriage problem: given a 
set of girls S and a set of boys T, can all the girls can be married off to a 
candidate of their choice. Hall’s Theorem gives a surprisingly simple answer 
to this question. A marriage arrangement can be found, if and only if every 
group of girls can choose among a set of candidates that is not smaller than 
the group itself. 

Hall’s Theorem also provides an immediate solution to a variation of Problem 
4.3.1 where defic{S') > 0 is required in place of defic{S') ^ 0. This problem 
can be answered positively, if and only if no complete matching exists. On 
the other hand. Problem 4.3.1 can be efficiently reduced to the >-variation. 
This is achieved by testing the variant problem for any bipartite graph that 
results from removing a single vertex of T. This ensures that a subset with 
zero deficiency will have a positive deficiency in at least one of these graphs, 
while the deficiency of no subset can turn from negative to positive. Since a 
complete matching, i. e. one of size [S'!, is particularly a maximum one, the 
problem finally reduces to the computation of an arbitrary maximal matching 
which has to be checked for its size. Fortunately, there are efficient algorithms 
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to cope with this problem. The central idea of these algorithms is based on 
the characterisation of maximal matchings by means of augmenting paths. 

Definition 4.3.3 (Augmenting Path). Let {S ktiT, E) be a bipartite graph, 
and M C E be a matching. An M -alternating path that starts and ends with 
an unmatched vertex is called an augmenting path with respect to M . 

The fundamental relationship between maximum matchings and augmenting 
paths is due to Berge [Ber57]: 

Theorem 4.3.2 (Berge 1957). Let {S l±l T,E) be a bipartite graph, and 
M C E be a matching. M is maximum, if and only if there is no augmenting 
path with respect to M . 

This characterisation gives rise to an efficient procedure to determine maxi- 
mum matchings. A standard algorithm successively enlarges a possibly empty 
initial matching by constructing augmenting paths in a breadth-first disci- 
pline. In this process for each depth an arbitrary augmenting path tt is se- 
lected, supposed there is any. Then tt is used to construct a larger matching 
M' from the current matching M in following way: the role of the edges of tt 
is switched, i.e. the M-edges of tt are removed from M' and conversely the 
non-M-edges of tt are added to M' . This process is illustrated in Figure 4.4. 




Fig. 4.4. a) A non-maximum matching M b) An augmenting path with respect to 
M c) The maximum matching induced by the augmenting path of b). 



A more elaborated description of such an algorithm that runs with worst- 
case time complexity of order 0(|V^| |if|) for a bipartite graph (V, E) can be 
found in [AHU83]. Hopcroft and Karp [HK73] present a more sophisticated 
algorithm that slightly improves this bound towards 0 (|K |2 |_E|). 



4.3.4 Efficient Computation of Tight Sets 

Unfortunately, the previous decision procedure is not constructive. However, 
we are actually interested in determining a subset with a maximum deficiency, 
that means a tight set. 

Problem 4.3.2 (Finding a Tight Set). 

Instance: A bipartite graph {S l±l T,E). 

Goal: Find a tight set of S. 
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Fortunately, tight sets can be computed efficiently, too, by employing maxi- 
mum matchings. Here we present two algorithms both suited for this purpose, 
one computing the largest and the other one the smallest tight set. Both algo- 
rithm are based on ideas used for determining the Gallai-Edmonds structure 
of a graph (cf. [LP86]). However, we give a much simpler formulation that is 
directly tailored for our application. 

The algorithm computing the largest optimal solution successively removes 
vertices from an initial approximation Sm until a fixed point is reached: 

Algorithm 4.3.1. 

Input: Bipartite graph G = (S l±l T, E) and a maximum matching M 
Output: The largest S'-tight set C S 

Sm := 

R := {t GT \t is unmatched}; 

WHILE i? 0 DO 

choose some x G R] 

R := R \ {x}; 

IF a: G S' 

THEN Sm ■■= Sm \ {x}; 

R := R U {y\ {x,y} G M} 

ELSE R := R U neigh{x) 

FI 

OD; 

T^{S) := Sm 

The algorithm that computes the smallest solution proceeds contrary. It suc- 
cessively adds vertices of S to an initial approximation Sm until a fixed point 
is reached: 

Algorithm 4.3.2. 

Input: Bipartite graph G = (S l±l T, E) and a maximum matching M 
Output: The smallest S-tight set T-'-(S) C S 

Sm ■= 0; 

A := {s G S I s is unmatched}; 

WHILE A yf 0 DO 

choose some a: G A; 

A := A\ {a:}; 

IF a: G S 

THEN Sm ■■= Sm U {a;}; 

A := A U neigh{x) 

ELSE A := A U {y I {x,y} G M} 

FI 

OD; 

T\S) := Sm 
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Both algorithms are perfectly suitable as the basic ingredient of an expression 
motion algorithm that exploits the optimal trade-off information. However, 
we have to decide for one strategy, since the globalisation presented in Section 
4.4 requires consistency of the trade-off information. In other words, the 
solutions computed by Algorithm 4.3.1 and Algorithm 4.3.2 must not be 
mixed when computing tight sets within our overall algorithm. For the sake 
of presentation, we will choose Algorithm 4.3.1 from now on as the relevant 
trade-off algorithm. Let us therefore take a closer look at the function of this 
algorithm. Starting with an upper approximation of Sm those vertices that 
can be reached through an alternating path originating at an unmatched 
vertex of T are successively found and eliminated from Sm- Informatively, 
this process ensures that all removed 5-vertices are matched, as otherwise 
this would establish an augmenting path in contradiction to the maximality 
of M. This is illustrated in Figure 4.5 where black circles indicate removed 
vertices, which means all vertices that have been added to R at least once."^ 
Denoting the matching partner of a removed vertex s G S' by M(s), we 
can easily see that neigh{T^{S)) C\ M{S\ T^(S)) = 0. Hence the subgraph 
induced by S\T^(S) and T \ neigh{T"^ {S)) is of negative deficiency and can 
be removed. Based on this idea, we can prove the main result of this section: 

Theorem 4.3.3 (Tight Set Theorem). For a bipartite graph (S l±l T,E) 
the set T^iS) constructed by Algorithm 4-3-1 is the largest S -tight set, i. e. 

1. T^{S) contains any S-tight set and 

2. T^{S) is tight itself. 

Proof of 1: Let S' be a tight set. Let us further denote the values of R and 
Sm after the f-th iteration by R' and S\,j, respectively. Then, by means of 
an induction on z, we are going to show: 

S' CS'm 

Induction Base: i = 0. In this case the inclusion S' C S^ is trivial. 

Induction Step: i > 0. Without loss of generality let s G S'jff^ n R'~^ be the 
element that is removed from Sm in the z-th iteration step. Then we are 
going to show that s G S'. 

Due to the construction of R we may assume without loss of generality 
that s has a neighbour t G neigh{s) such that {s,t} ^ M which has 
once added s to the workset. On the other hand, s must be matched. 
Otherwise, there would be an augmenting path between s and an initial 
vertex t G RP by construction which would contradict to the maximality 
of M . Thus there is also a neighbour t' of s such that {s, <'} G M. 



4 



In fact, marking processed node the algorithm could be easily modified such that 
a vertex is added to R at most once. 
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Let us now define a set S that contains all vertices in S' that are reachable 
from s via alternating paths starting with the matching edge 
Immediately by construction we have: 

i) SC S' 

ii) neigh{S' \S) H neigh(S) = 0 

iii) |.5| < \neigh(S)\ 

The idea behind this construction is sketched in Figure 4.6. Now it is left 
to show that S' \S has a greater deficiency than S', which is in contrast 
to the assumption that S' is tight. To this end we calculate 

|S"\5| - \neigh{S' \ S)\ 

[Assumptions (i) & (ii)] = IIS''] — |.5| — ( \neigh{S')\ — \neigh{S)\ ) 

= \S'\ - \neigh{S')\ - ( j^j - |nez 5 /i(^) |) 

[Assumption (iii)] > |iS'^| — \neigh{S')\ + 1 
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Fig. 4.6. Illustrating the con- 
struction of S 



Proof of 2: Let us assume an arbitrary S'-tight set S'. According to the first 
part we can assume that S' C T^{S) and thus also neigh(S') C neigh{T^{S)). 
Since by construction every vertex in neigh{T^{S)) is matched, particularly 
every t G neigh{T^{S)) \ neigh(S') is matched by a unique adjacent vertex 
in T^{S) \ S' . This situation is sketched in Figure 4.7. 



Fig. 4.7. Preservation of the 
deficiency of S' within T^(S') 



\neigh{T\S))\neigh{S')\ ^ |TT(5) \ 5'| (4.4) 

Thus we have: 

\T\S)\-\neigh{T\S))\ 

[5'CT^(S)] = |,S'| + |TT(5) \ ^'1 - 

( \neigh{S')\ + \neigh{T^{S)) \ neigh{S')\ ) 
= — \neigh{S')\ + 

( \ S'\ - \neigh{T^{S)) \ neigh{S')\ ) 

' V ' 

^0 

[Inequation 4.4] ^ jS'^j — |rteig/i(S'^)| 

which means that the deficiency of 'T^(S) is no worse than the one of S". □ 

Figure 4.8 gives some examples of tight sets as they are computed by means 
of Algorithm 4.3.1. 



neigh (S’) 

1 neigh (•TfS)) 




S’ PfS) 

In particular, this guarantees: 
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Fig. 4.8. Tight sets 

We close this section by introducing a class of subsets that becomes important 
when we have to consider the relationship between bipartite graphs being 
associated with adjacent program points. Although these bipartite graphs 
usually do not coincide, some parts of the tight set at a program point h 
are preserved within the tight sets of its successor points. To this end we 
introduce: 

Definition 4.3.4 (Irreducible Subsets). 

Let G'^= (S' l±l T^E) be a bipartite graph and S' C S. 

1. A pair (S,T) with S C S' and T CT is a tightness defect of S' , if and 
only if the following three eonditions are met: 

a) \f\ > |S| 

b) neigh{S) A T 

c) neigh\s' \ S) n T = 0 

2. S' C S is called irreducible, if and only if it has no tightness defects, 
otherwise it is called reducible. 

Obviously, an irreducible subset S' C S is a deficiency set, too, as otherwise 
(S', neigh(S')) itself would be a tightness defect. Actually, the criterion of 
being irreducible is much stronger. In particular, the absence of tightness 
defects means that there are no subsets of S' whose removal increases the 
deficiency (see Figure 4.9 for illustration). 

Clearly, tight sets are irreducible. However, irreducible sets provide a more 
fine grain characterisation that is expressed in the following result. 

Lemma 4.3.1 (Irreducible Subset Lemma). 

Let G (S W T, E) be a bipartite graph. Then we have: 

1. If T' C neigh{T^{S)) then neighlT') n T^(S) is an irreducible subset of 

S. 

2. If S' C S is an irreducible subset, then S' C T^(S). 
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tightness 
defect of S’ 

Fig. 4.9. Tightness defect of a 
set with positive deficiency 

Proof. For the proof of Part (1) let us assume that S' neigh{T') n T^{S) 
has a tightness defect (S,T) (see Figure 4.10(a) for illustration). As by con- 
struction neigh{T^{S) \ S") n T' = 0 this would enable to remove the vertices 
of S from T^(S'), while strictly improving the deficiency of T^{S), which is 
in contradiction to its tightness. 

For the proof of Part (2) let us assume an irreducible subset S' C S 
with S' 2 ^^(>5) (see Figure 4.10(b) for illustration). Let us consider S 
S' \ T^{S) and T =* neigh{S') \ neigh{T^{S)). It is easy to see that (S,T) 
meets condition (b) and (c) of Definition 4. 3. 4(1). Hence condition (a) must be 
violated, which means \T\ ^ |S'|, since otherwise (S,T) would be a tightness 
defect of S', which is in contradiction to its irreducibility. Using this inequality 
we immediately get that S' U T^iS) is a subset of S whose deficiency is 
no worse than the one of T^{S) and that strictly comprises T^{S). This, 
however, is in contradiction to Theorem 4.3.3. Hence we have S' C T^{S) as 
desired. □ 





4.4 Levelwise Flexible Expression Motion 

In this section the main results of Section 4.3 are utilised in order to con- 
struct the proposed algorithm for inductively lifetime optimal expression mo- 
tion. We will call this transformation Levelwise Flexible Expression Motion, 
for short LFEM,|.. The adjective flexible addresses the fact that its insertion 
points are no longer chosen according to a fixed strategy as in the case of 
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BEM<f and LEM<|>, but are rather determined flexibly somewhere between the 
earliest and the latest initialisation points incorporating trade-offs between 
the lifetime ranges of different temporaries. However, the method progresses 
in a levelwise fashion rather than to consider all expressions at once. We will 
discuss limitations of this approach in Section 4.5.2 which serves as a moti- 
vation for a fully fledged algorithm for expression motion not being limited 
in this way. 



4.4.1 Defining LFEM^ 

As BEM,f and LEM^, also the component transformations of LFEM^, are such 
that original computations are always replaced. Hence we have for any tp € <P 
and fi G N: 

LFEM^- Replace^ ^ Comp'f 

According to Definition 4.2.1 this ensures that actual replacements of the 
structured transformation with respect to are restricted to maximal occur- 
rences. 

hFEV[^-Replace'f (p G MaxSubExpr^{p™^), 

where addresses the right-hand side expression associated with h. 

Insertions of the transformation are determined by a process that refines 
delayability as defined in Definition 3.3.3 level by level starting with the 
minimal expression patterns, i.e. the 0-level expressions of 

I. The Induction Base: expressions in 

Minimal expression patterns are initialised according to the individual 
lazy expression motion transformations. Thus we have for any h G N 
and expression pattern (p G 

LFE%-Insertf <=? LEM,^-/nsert^ 

II. The Induction Step: expressions in i ^ 1 

The algorithm proceeds from level 1 to higher levels computing the opti- 
mal initialisation points for expressions of the level under consideration, 
while capturing profitable trade-offs between the current level and levels 
already processed. To this end, the local trade-off information which is 
determined by means of the techniques of Section 4.3 is globalised by 



® The idea to use our algorithm for lazy expression motion as the basis of a re- 
finement process also led to an algorithm for lazy strength reduction [KRS93] 
that combines expression motion with strength reduction [ACK81, CK77, CP91, 
JD82a, JD82b, Dha89b, Dha89a]. 
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adjustments of the central predicates involved in the definition of lazy 
expression motion. 

II. a First Adjustment. For structured expression motion Definition 4.2.1 
imposes a constraint on the order of initialisations. This constraint is 
reflected in a modification of the down-safety predicate. Now the range 
of down-safe program points is further restricted to program points where 
it can be granted that immediate subexpressions are initialised before : 

Al-DnSafef ^ DnSafe^ A \/ tp G SubExpr^{ijj). LFEH^- Correct^! 

This definition straightforwardly induces modified predicates for safety, 
earliestness, delayability and latestness, respectively, that are all marked 
by a preceding A1-, as for instance Al-Delayed . 

II. b Second Adjustment. Our central concern, that is to capture trade-offs 
between lifetimes of temporaries, is now addressed by means of a further 
adjustment of the delayability property: delayability of i-level expressions 
is terminated at program points where their premature initialisation pays 
off in terms of the overall lifetimes of temporaries. 

To this end we introduce two sets of trade-off candidates associated with 
every program point h: 

the set of upper trade-off candidates at h, refers to expressions 
of level i whose initialisation is possible at h but can optionally be 
postponed to strictly later program points as well. 

gjup(i) djf I Al-Delayedf A -^Al-Latestf} 

the set of lower trade-off candidates at h, refers to expressions of 
level strictly lower than i whose associated lifetime-ranges may possibly 
be terminated at h. This excludes expressions whose associated lifetime 
ranges have to survive h anyhow. 

For a formal criterion we introduce a predicate UsedLater [F] that cap- 
tures situations, where a lifetime range always has to survive the pro- 
gram point irrespective of the expression motion under consideration. The 
predicate is supplemented by a structured subset of expressions <P C F 
that fixes its scope:® 

UsedLaterflF] ^ S P[h,e] 3 1 < j ^ \p\. 

{fL7EV[^- Replace'^ _ V G SupExpr . Al-Earliestp.'j A 

V 1 < fc ^ j. ^Al-Earliestp^ 

For expressions of maximal level UsedLater [<F\ coincides with the complement of 
the isolation predicate in [KRS92, KRS94a], which was introduced in order to 
eliminate insertions that are only used immediately afterwards. In fact, in LFEMj. 
this step evolves as the natural add-on that results from exploiting LUsedLater- 
information also with respect to the maximum level. 
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Based upon this predicate the formal definition of is: 

gjdn(i) d^f I A SupExpr^uj.(,i){ip) ^ 0} 

and 0^^^*^ form the vertices of a bipartite graph whose edges are 
given by the immediate subexpression relation. Algorithm 4.3.1 is used 
in order to define a local predicate LChange capturing those computa- 
tions whose premature initialisations are profitable with respect to the 
expressions of levels below: 

LChange'f ^ G nezg/i(T^(0^“*-*^)) 

This new local predicate is taken into account for a second adjustment 
of delayability, where in addition delayability of an upper trade-off can- 
didate is terminated as soon as a profitable change becomes possible. 



A2-Delayed'f Vp G P[s, fi] 3j ^ \p\. Al-Earliest^. A 

Vj ^ k < \p\. ^Compp^ A ^LChangCp^ 
The modified delayability predicate induces a corresponding adjusted 
predicate for latestness that is also marked by a subscript Adj2: 

A2-Latest'f ^ A2-Delayed'f A 

{Comp'f V 3m G succ{h). ~^A2- Delayed^) 

II. c Initialisation Points: The results determined in step II. b) induce 
the insertion points for p G 

LFEM<^-/nsert^ @ A2-Latest!f 

4.4.2 Proving Admissibility 

The proof of admissibility of LFEM<|, is quite straightforward, since the def- 
inition of the predicates is already tuned in a way that the conditions of 
Definition 4.2.1 are automatically met. We start by collecting some proper- 
ties on adjusted delayability. 

Lemma 4.4.1 (Delayability Lemma for LFEM,^). 

1. y ‘tp G d>, h G N . A2-Delayedf Al-Delayedf Delayed^ 

2. V'0G<?, h G N. Al- Delay edf Al-DnSafe^ DnSafef 

3. y tjj G d>, h G N ■ Al- Delay edf Delayed'^ A Al-DnSafef 

Proof. Part (1) is directly due to the definitions of the different notions of 
delayability. The first implication of Part (2) can be proved along the lines 
of Lemma 3.4. 1(1) by using the corresponding adjusted notions, whereas the 
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second implication is immediate from the definition of Al-DnSafe . The =i>- 
direction of Part (3) is immediately implied by Part (1) and (2). Finally, the 
4=-direction of Part (3) follows from a path-based argument: 

Delayed^ 

[Definition Delayed] 

^ Vp G P[s, h] 3 z ^ jpj. Earliestp. A Vz ^ A: < jpj. ^Comp^^ 
[(4.5)] Vp G P[s, h] 3 z ^ |p|. Al-Earliestp. A Vz ^ A: < |p|. -^Compp^ 
[Definition Al-Delayed] 

Al-Delayed^ 

Implication (4.5) is justified, because the assumption Al-DnSafef ensures 
that there is a least index z ^ j ^ |p| such that Al-DnSafep^ holds for all 
j ^ A: ^ |p|. For this j the condition Earliestp. implies Al-Earliestp.. This is 
trivial if i=j, otherwise one can easily check that ^UpSafep^ is true for any 
i ^ k < j. □ 

Finally, for latestness as the limit points of delayability we have some strong 
characterisations : 

Lemma 4.4.2 (Latestness Lemma for LFEM^,). 

1. Vz/> G f?, h £ N. Al-Latestf LFEM,i> - Correct^ 

2. Vz/>G<?, h G iV. Latest^ Al-DnSafef 

3. V'lpG’P, uGN. Latest^ Al-Latestf 

Proof. Considering Part (1) we use that Al-Latestf together with Lemma 
4.4. 1(1) yields 

Compf V 3 m £ succ{h). ^A2-Delayed'^. (4.6) 

Then we have 

Al-Latest^ 

[Definition Al- Latest] 

=> Al-Delayedf 
[Definition Al-Delayed] 

Vp G P[s,h] 3z ^ |p|. Al-Earliestp. A Vz ^ A: < |p|. ^Compp^ 
[Definition A2-Delayed] 

Vp G P[s,h] 3z ^ |p|. A2-Delayedp. A Vz ^ A; < |p|. ^Compp^ 
[Assumption (4.6) & Definition A2-Delayed] 

Vp G P[s,h] 3z ^ |p|. A2-Latestp. A Vz ^ A; < |p|. -^Compp^ 




64 



4. Optimal Expression Motion: The Multiple-Expression View 



[(4.7)] Vp G P[s,n] 3i ^ \p\. A2-Latestp. A Vi ^ fc < \p\. Transpp^ 
[Definition LFEM<f- Correct] 

LFEM^>- Correct^ 

To prove Implication (4.7) let us assume an index k with z ^ A: < |p| 
such that Transpp^ is violated. As A2-Latestp. implies DnSafCp. (cf. Lemma 
4.4.1(1&2)) the definition of down-safety would mean that there is an index 
I with i ^ I ^ k such that Compp^ is true in contradiction the premise of the 
implication. 

Part (2) is proved together with the =>-direction of Part (3) by means of an 
induction on the structure of z/;. For the induction base z/> G Part (2) is 
due to the following sequence of implications: 

Latest^ Delayed^ DnSafef Al-DnSafe^ , 

where the first implication is due to the Definition of Latestf , the second one 
follows from Lemma 3.4. 1(1) and the third one is a consequence of the fact 
that ■(/> G combined with the Definition of Al-DnSafef . Then Part (2) can 
already be used for the =>-direction of Part (3): 

Latest^ 

[Definition Latest] 

Delayed {Compf V 3 m G succ{h). ^ Delayed]^) 

[Part (2)] 

Delayed^ A Al-DnSafef A 

{Compf V 3m G succ{h). -^Delayedf^) 

[Lemma 4.4.1(1&3)] 

Al-Delayedf A {Compf V 3 m G succ{h). ^Al- Delayed'^) 
[Definition Al-Latest] 

Al-Latestf 

For the induction step we shall consider z/> G (z ^ 1) and prove the 
implication of Part (2) in first place: 

Latest]^ 

[Definition Latest & Lemma 3.4. 1(2) & LEMj> G A£A4<f] 

^ DnSafef A W p G SubExpr^{tp). LEM^- Correct f 
[Definition LEMj>- Correct] 

DnSafef A V (p G SubExpr^{tp), p G P[s,h] 3j ^ |p|. 

Latestf. A Vj ^ fc < |p|. Transpf^ 
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[Induction Hypothesis] 

DnSafef A G SubExpr^{tp), p G P[s,n] 3j ^ \p\. 

Al-Latestp^ A Vj ^ /c < jpj. Transpp^ 

[Part (1)] 

DnSafef A \/ (p G SubExpr^{tp), p G P[s,n] 3j ^ |p|. 

LFEM<f- Correct^ A W j ^ k < \p\. Transpp^ 

[Definition LFEM<f- Correct] 

DnSafef A ip G SubExpr^{ij^). hVEA^-Correct'f 
[Definition Al-DnSafe\ 

=> Al-DnSafe^ 

The induction step of the =i>-direction of Part (3) can be done as in the 
induction base. 

Finally, for proving of the 4=-direction of Part (3) let us assume Al-Latest'l , 
which means by definition 

Al-Delayedf A{Comp^ V 3m G succ{h). ^Al- Delayed'^) 

Using Lemma 4.4. 1(3) this implies 

Al-Delayedf A{Comp^ V 3m G succ{h). -^Delayed^ V ^DnSafe^) 

At this point it is easy to see that the case ^DnSafef^ is subsumed by Comp^ . 
To this end we use Lemma 4.4. 1(2), which delivers DnSafef . By definition of 
DnSafe this in combination with ^DnSafe^ implies Compf. Therefore, we 
get 

Al-Delayedf A{Compf V 3m G succ{h). ^Delayedfj, 

which after a final application of Lemma 4.4. 1(1) yields Latest^ as desired. 

□ 

These results suffice to show the admissibility of LFEM^, . 

Theorem 4.4.1 (Transformation Theorem for LFEM,^). LFEM^, is 

1. a structured expression motion in the sense of Definition 4-2-1 and 

2. admissible, i. e. LFEM^, G A£A4,p 

Proof. For the first part one has to show that the initialisations of subex- 
pressions are already available at their initialisation points, i.e. 

i) y-if G 'P, h G N. LFEK 4 ,- Insert^ G SubExpr,p{if). LFEK 4 ,- Correct!f . 
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For the second part, i. e. the admissibility of LFEM,^ , it is sufficient to show 

ii) V'0 e h £ N. LFEM^-Insertf Safef 

iii) W £ <P, h £ N . LFEM^- Replace^ => LFEM^,- Correct^ 

Point i) and ii) can be proved simultaneously: 

LFEM<f-/nsert^ 

[Def. LFEM<f-/nsert] A2- Delay 

[Lemma 4.4.1(f)] Al-Delayedf 

[Lemma 4. 4. 1(2)] Al-DnSafef 

[Def. Tl-DnS'a/e] DnSafe^ A W ip £ SubExpr^{ip). LFEM^- Correct^f 

Finally, let us consider point iii). Here we have: 

LFEK4>- Replace^ 

[Def. LFEM,J, & LEM,?,] 

LEFl<i,- Replace^ 

[lem,j, e A £ M 4 >] 

=A- LEM<f- Correct'^ 

[Def. LEM<f- Correct] 

=> Vp G P[s, n] 3 1 ^ z ^ |p|. Latestp. A Vz ^ j < |p|. Transp^. 
[Lemma 4.4.2(1&3)] 

Vp G P[s, fz] 3 1 ^ z ^ |p|. LFEM^,- Correct^ A 

^ j < bl- Transpp. 

[Def. LFEMj>- Correct] 

^ LFEE^- Correctf □ 

With the admissibility of LFEM^, it is now possible to propose a result which 
complements the Structured Safety Lemma 4.2.2: 

Lemma 4.4.3 (Structured Adjusted Safety Lemma). 

y ‘tp £ <P, if £ SubExpr^{tp), h £ N. Al-DnSafe^ Al-DnSafe^ 

Proof. Let ip £ d> with Al-DnSafef and ip £ SubExpr ^{ip) . According to the 
definition of Al-DnSafe and Lemma 4. 2. 2(2) we get DnSafef. Hence it is 
only left to show 

y (p £ SubExpr^{ip). LFEK^- Correct^ (4.8) 

To this end we shall use that Al-DnSafe^ implies LFEM^>- Correct^’ and then 
exploit that the admissibility of LFEM,i> allows to apply Lemma 4.2.4 in order 
to obtain Property (4.8) as desired. □ 
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4.4.3 Proving Computational Optimality 

Central for the computational optimality of LFEM^ is the following charac- 
terisation on the behaviour of the adjusted predicates on intervals between 
earliest and latest program points (see Figure 4.11 for illustration). Since 
the lemma is used for an induction proof of computational optimality, the 
individual transformations belonging to the subexpressions of the expression 
under consideration are assumed to be computationally optimal. 



DnSafcAd^i 



[ Delafped 


j 


, Delayed ] 




DelayedAtii 2 ) 



Earliest Latest Ad.22 

Fig. 4.11. Situation in Lemma 4.4.4 



Lemma 4.4.4 (Adjustment Lemma for LFEM,j). 

Let Ip G such that (IH) LFEM,^ is computationally optimal for any (p G 
SubExpr^{ip). Moreover, let p be an interval of program points between the 
earliest and latest program point (cf. Lemma 3. 4-2(2)), i. e. a path with (Al) 
Earliestp^, (A2) Latest^^^^ and (AS) V 1 ^ A: < \p\. Delayed^^ A -^Latest'^,^. 
Then there are indices 1 ^ ^ l ^ |p| such that: 

1. VI ^ fc ^ |p|. Al-Earliestp^ (i = k) 

2. VI ^ fc ^ \p\. Al-Delayedp^ (i ^ k) 

3. V 1 ^ fc ^ |p|. A2-Delayedp^ (i ^ k ^ j) 

4- VI ^ fc ^ |p|. A2-Latestp^ (j = k) 

Proof. First let us collect some properties on the path p under consideration: 



V 1 ^ fc < \p\. 


-nComp)^^ 


(4.9a) 


VI ^ fc ^ \p\. 


DnSafe^,^ 


(4.9b) 


VI ^ fc ^ \p\. 


^UpSafe^^ 


(4.9c) 


V 1 ^ fc < \p\. 


Transpp,^ 


(4.9d) 



Point (4.9a) is an immediate consequence of Assumptions (A2) and (A3) 
using the definitions of delayability and latestness. Moreover, according to 
Lemma 4. 4. 1(2) Assumptions (A2) and (A3) also imply Property (4.9b). The 
earliestness of pi together with Property (4.9a) induces Property (4.9c). For 
the proof of Property (4.9d) let us assume an index k < \p\ with ^ Transpp^ . 
As, according to Property (4.9a), also ^Comp^^ holds, we obtain ^DnSafCp,^ 
in contradiction to Property (4.9b). Now we shall in turn deal with the proofs 
of Point (1) to (4): 
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Proof of 1: Let us choose i as the smallest index that satisfies Al-DnSafCp.. 
Such an index exists, since Assumption (A2) combined with Lemma 4.4. 2(2) 
ensures Al-DnSafCp^^^. With this choice Al-Earliestp. is implied by Assump- 
tion (Al) if z = 1, otherwise it follows from Property (4.9c). To show also the 
uniqueness of i, i.e. the forward-implication of Point (1), it is sufficient to 
show 

Wk^i. Al-DnSafe^^ (4.10) 

For z ^ A: < IpI we have 

Al-DnSaftp^ 

[Definition Al-DnSafe] 

DnSafCp^ A V(p G SubExpr^{tp). LFEM^- Correctp^ 

[Property (4.9b)] 

DnSafCp^^^ A V G SubExpr . LFEM^,- Correct^ 

[(4.11)] DnSafCp^^^ f\\/ ip € SubExpr^{'ip). 'LFEF[,^-Correctp^^_^ 

[Definition Al-DnSafe] 

^ Al-DnSafe^^^ 

For Implication (4.11) let us assume ip G SubExpr^{f^) with LFEM^,- Correct^ . 
Now we are going to show that also LFEM^,- Correct^^^ must hold. As Assump- 
tion (A3) ensures Delayedp^^^, Lemma 4. 2. 5(1) yields 

Delayecff, A ^Latest V LEM<|,- Correct.)) 

^ yk + 1 yk + 'i- yk + 1 

^ V " " V " 

i) ii) 

In the case of (i) we argue with Lemma 3. 4. 2(2) and Remark 3.4.1 which 
are applicable, since, by (IH), LFEM,^ is assumed to be computationally opti- 
mal. These results then establish that LFEM^,- Correct^ carries over to 
yielding LFEM,z>-Correct^^^j^. 

Considering (ii) we can argue similarly. Here we succeed using Corollary 
3.4.1, which allows to conclude LFEM^,- Correct^^^ from LEM,z>- Correct^^^ . 
Again this requires to exploit Assumption (IH). 

Proof of 2: For indices k < i we have ^Al-DnSafCp^ by construction, which 
forces also ^Al-Delayedp^ according to Lemma 4. 4. 1(2). On the other hand 
for k ^ i we have using Point (4.10) as well as Assumption (A2) and (A3): 

\/k~^i. Delayedp^ A Al-DnSafCp^, 

which according to Lemma 4.4.1 means 'i k ^ i. Al-Delayedp^. 
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Proof of 3: Let us choose index j as the largest index in the range 

{z, . . . , |p|} such that the backward-implication of Part (3) is valid, i. e. 
y i ^ k ^ j. A2-Delayedp^. Note that at least A2-Delayedp. holds, and thus 
the existence of such an index j can be ensured. For the forward-implication 
it is left to prove that 

y 1 ^ k < i. ~^A2-Delayedp^ A y j <k ^ \p\. ^A2-Delayedp^ 

i) ii) 

First, the fact that Part (2) ensures V 1 ^ fc < z. Al- Delay edp^ also grants 
(i) using Lemma 4.4.1(f). For the proof of (ii) we shall assume without loss 
of generality that j < \p\. Then ^A2-Delayedp.^^ is satisfied by definition of 
J, and for j < k < \p\ the following inductive argument applies: 

-^A2-Delayedp^ ^A2-Delayedp^^_^ V Al-Earliestp^^^ 

^A2-Delayed^^^^, 

where the first implication is a trivial consequence of the path characteri- 
sation of A2-Delayed {A2-Delayed can only be triggered at program points 
satisfying the Al-Earliest-pred\c&te) and the second one is a consequence 
of Part (1) which ensures ^Al-Earliestp^^^. Finally, Part (3) immediately 
implies Point (4), if j < \p\. Otherwise, Part (4) can be concluded from 

Al-Latest^^ □ 
r\p\ 

Lemma 4.4.4 can be applied levelwise establishing the computational opti- 
mality of the individual LFEM,p-transformations for any p € <P. This directly 
carries over to LFEM<f yielding: 

Theorem 4.4.2 (Computational Optimality Theorem for LFEM^>). 

LFEM^, is computationally optimal, i. e. LFEM,^ € COSA4,p. 



We close Section 4.4.3 by providing two further results that crucially depend 
on the computational optimality of LFEM^,. In addition to Lemma 4.4.4 that 
can be understood as the counterpart of Lemma 3. 4. 2(2) also the first part of 
Lemma 3.4.2 that states that computationally optimal transformations must 
place their insertions inside of intervals of delayable program points has its 
equivalent in the first part of the following lemma. Moreover, the second part 
expresses the fact that subexpressions must be available over the whole range 
of adjusted delayability. Note that this part has some similarities to Lemma 
4.2.5(3). 

Lemma 4.4.5 (Insertion Lemma for LFEM^,). Let EM^, e COEM.^ such 
that EM,j,<i =LFEM,f<i, if £ and h £ N. Then we have: 

1. -Insert^ Al-Delayedf 

2. Al-Earliestf y ^Al- Delay edf ^ y p £ SuhExpr^{'ip).-~EV[,i,-Insert'f 
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Proof. The proof of Part (1) is trivial for ip G Hence let us assume 
ip G ■ Then the requirements on structured expression motion (cf. Definition 
4.2.1) ensure that subexpressions of ip are available at h, i.e. 

\/pG SubExpr^{ip). EKp-Correct^f 

Using our assumption on EM<|> we have 

y p G SubExpr^{ip). LFEM^- Correct^ (4-12) 

According to Lemma 3. 4. 2(1), which is applicable due to the computational 
optimality of LFEM^>, we further have Delayed^, which due to Lemma 3.4.1(f) 
implies DnSafef. This together with Property (4.12) yields Al-DnSafef . 
Hence Lemma 4.4. 1(3) finally establishes Al-Delayedf . 

For the proof of Part (2) let p G SubExpr^{tp). Then 

-^Al-Earliestf A Al-Delayed^ 

[Definition Al-Delayed & Al-Latest ] 

^ y m G pred{n). Al-Delayed A ^Al-Latest^ 

[Lemma 4. 4. 1(2) & Definition Al-Latest] 

^ y m G pred{h). Al-DnSafe^ A -^Comp^ 

[Definition Al-DnSafe] 

^ y m G pred{h). LFEM^-Correcty^ A DnSafe^ A ^Comp^ 
[Definition DnSafe] 

^ y m G pred{h). LFEVl<i,-Correcty^ A Transp'^ 

[Lemma 4. 2. 1(1)] 

y m G pred{h). EFE¥L,f,- Correct‘d A Transp!^ 

[EM.j.<i = LFEM.j<i] 

y rh G pred{fi). EKp-Correctlf^ A Transp'^ 

[eMs. G COSMr>] 

-~EV[^i-Insert'^ □ 



4.4.4 Proving Inductive Lifetime Optimality 

The proof of inductive lifetime optimality for LFEM,^ is more challenging. 
We start this section by proving some basic properties of the LUsedLater 
predicate. We have: 



Lemma 4.4.6 (LC/sedLoter- Lemma). 

Let EM^, G COSA4,p such that EM,f<i =LFEM,f<i and fi G N ■ Then 
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1. y ‘tp G Al-DnSafef A Transpf UsedLater^[<l^^^] 

2. a) W ip £ <P. EH^-Correct!f A UsedLater!f[<P^^] ^ h £ SLtRg{EV[^fii,p) 

b) y tp £ . h £ SLtRg{EH^^i,ip) ^ EK<p- Correct^ A UsedLater^[<l>^^] 

Proof. Considering Part (1) let us assume %p £ such that 
Al-DnSafe^ A Transpf 

holds. According to the definition of Al-DnSafe this means that there is a 
path p £ P[h,e] and an index 1 < j ^ \p\ such that: 

Compp. A V 1 ^ fc < j. Al-DnSafOp^ A Transpp^ 

Using the fact that ip is maximal in and the definition of Al-Earliest 
this particularly implies: 

LFEM^<i-AepZoce^. A Wl ^ k < j. ^Al-Earliestp^, 
which means UsedLaterf['P^^] as desired. 

For Part (2a) we first note that the definition of EM<f- Correct ensures that 
there is a path q £ P[s, ft] and an index j ^ jpj such that 

EM^<i-/nsert^ A Vj < fc ^ jpj. -^EM^^i-Insert^^ (4-13) 

On the other hand, due to condition UsedLater!f[(l>^^] that there is a path 
q £ P[ft, e] and an index 1 < j ^ jgj such that 

{3 pj £ SupExpr^i{p). Al-Earliest^^ V LFEM^^i-Replace^^) A 

y 1 < k ^ j. ^Al- Earliest 

Let us first investigate the case, where LFEM,j,<i-i?eptoce^ holds. Thus we 
have: 



LFEM^^i- Replace^ A V 1 < A: ^ j. ^Al-Aartfest^ (4-15) 

' V ^ ' ' 

i) a) 

Now we are going to prove: 

V 1 < fc ^ j. -Insert^ 

Suppose there is an index 1 < A: ^ j with EM,f<i-/7tsert^ . Then Lemma 
4.4.5(f) ensures that qu is preceded on every path from s to qk by a program 
point satisfying Al-Earliest. Using Condition (ii) this point must precede fi 
yielding 



VI < I ^ A:. Al-Delayedy A^Al-Latest^ 
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Due to the fact that Correct^ holds this, however, would mean that 

EM<f has two (^-insertions on an interval of delayable program points, which is 
in contradiction to Lemma 3. 4. 2(2). Similarly, the previous argument also en- 
sures that Condition (i) implies EK^^i-Replactp.^ since otherwise EM,f<i would 
again have two computations of (p on an interval of delayable program points 
contradicting Lemma 3. 4. 2(2). Hence Condition (4.15) can be transformed to 

EK^^i-Replace^^ A V 1 < A: ^ j. -^EM^^i-Insert^^ (4.16) 

i) ii) 



Investigating the other alternative of Condition (4.14) the assumption on EM^, 
delivers: 



3ip G SupExpr^i{ip). Al-Earliest^^ A 

Hi) 

V 1 < fc ^ j. -Insert^ 

' V ' 

iv) 



(4.17) 



Using Lemma 4. 4. 5(1) Condition (iii) further ensures that there is an index 
I ^ j such that EH^^i-Insert^^ holds and such that 

Vj < fc ^ 1. Al-Delayedq^ A^Al-Earliest^^ 

Due to Lemma 4.4. 5(2) this finally excludes an (^-insertion of EM,^ between 
qj+i and qi. Hence we finally get 

3ip G SupExpr^i(ip). EM^^i-Insert^^ A V 1 < fc ^ L -^EM^^i-Insert^^ (4.18) 

Putting together the “upward-situation ” of Condition (4.13) and the “down- 
ward-situation” of Condition (4.16) or Condition (4.18), respectively, we ob- 
tain a path p with \p\ ^ 2 and: 

i) E¥i^^i-Insert!^^ 

ii) 3%p G SupExpr^i{Lp). E¥i^iii-Insertp^^^ V LFEM,j,<i-i?ep^oce^^l 

iii) VI < fc ^ |p|. -~EV[^^i-Insert!^^ 

This, however, means h G SLtRg{EV[^f^i,‘p). 

Investigating Part (2b) Lemma 4.2.7 delivers EM,^- Correct^. On the other 
hand, UsedLaterf is immediate from the maximality of ^/>, which requires 
that the lifetime range reaches an original replacement site of 'ip. □ 
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We continue with some useful properties on trade-off candidates: 

Lemma 4.4.7 (Trade-off Candidate Lemma for LFEM^,). 

1. y h G N, fn G pred{succ{h)) . 

2. a) VtiGN, i^G UsedLaterf[<P^^] 

b) yfiGN, pG UsedLaterf[^<^] 

3. y h G N, ip G LFEM.j,< i-0orrect|’ 

4- a) yfi G N, Ip G hs G succ{h). ^Al-Earliestf^ 

h) y h G N , ip G fis G succ(n). ^Al-Earliesty^ 

Proof. Considering Part (1) let ip G 0“^^*^ If h is an entry point of a node 
there is nothing to show since pred{succ{h)) = {fi}. Similarly, if |succ(h)| ^ 
2 then Control Flow Lemma 2.2. 1(2) ensures pred{succ{n)) = {n\. There- 
fore, we may assume: 

succ(h) = {fis} (4.19) 

Then we have: 



[Definition 

=i> Al-Delayedf A -^Al-Latestf 
[Definition Al-Latest & Lemma 4.4.4] 

Al-Delayedf^ A -^Al-Earliest^^ 

[Definition Al-Delayed & Al-Latest ] 

^ Al-Delayed^^ A 

y m G pred{hs). Al-Delayedf^ A ^Al-Latest^ 
[Assumption pred{succ{h)) = {n}] 

y m G pred{succ{h)). Al-Delayed^ A ^Al-Latest^ 

[Definition 0^^*^] 

y m G pred{succ{h)). ip G 0^^*^ 

Symmetric arguments prove the equality of all 0]^^*^ for m G pred{succ{h)) . 
Investigating Part (2a) we have: 

p,Get^ 

[Definition 0“’’^*'] Al-Delayedf A ^Al-Latestf 

[Definition Al-Latest] Al-Delayedf A ^Compf 




74 



4. Optimal Expression Motion: The Multiple-Expression View 



[Lemma 4. 4. 1(2)] => Al-DnSafef A -^Compf 

[Definition Al-DnSafe] Al-DnSafef A Transpf 

[Lemma 4. 4. 6(1)] UsedLateri^[<P^^] 

For the proof of Part (2b) we may use parts of the proof of Part (2a): 
[Definition '-"'>] 3 ■0 € neigh{ip). 

[Proof Part (2a)] 3 7p € neigh{ip). Al-DnSafe^ A Transpf 

[Lemma 4. 2. 1(1) & 4.4.3] 3^p G neigh{ip). Al-DnSafeP A Transpf 

[Lemma 4. 4. 6(1)] UsedLater'P[(l>^''\ 

For the proof of Part (3) let Lp G . Obviously, then there is an expression 
0 G neigh{ip) that gives rise to the following argumentation: 

[Definition 0“'’^*^] Al-Delayedf 

[Lemma 4. 4. 1(2)] Al-DnSafef 

[Definition Al-DnSafe] LFEM,j,<i -Correct^ 

Investigating the proof of Part (4a) we have for fis G succ{fi): 

0 G 

[Part (1)] Vm G pred{hs). ip G 

[Proof Part (2a)] W m G pred{hs). Al-DnSafe^ A Transp^ 

[Definition Al-Earliest] ^Al-Earliest^^ 

Finally, Part (4b) can rely on the proof of Part (4a): 

[Definition 0^^*'] 3ip G neigh{ip). 0^^^^ 

[Proof Part (4a)] 

3ip G neigh{(fi) Vm G pred{ha). Al-DnSafef^ A Transp^ 
[Lem. 4. 2.1(1) & 4.4.3] W m G pred{hs). Al-DnSafe!^ A Transp'^ 

[Definition Al-Earliest\ ^Al-EarliestP^ □ 

The following lemma has a key role in the whole proof of inductive lifetime 
optimality, as it ensures that the local trade-off information given by the 
L Change predicate is perfectly conform with its globalisation given in terms 
of the second adjustment of delayability. This means, a premature initialisa- 
tion that is found to be profitable at one program point remains profitable 
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(from a local point of view) within the whole delayability range. Besides be- 
ing important for the proof of inductive lifetime optimality, this property 
also simplifies the implementation, since the adjustment of delayability can 
be computed by a trivial local refinement step (cf. Section 4.4.5). Finally, 
however, we should note that the result relies on Lemma 4.4.7(f) and there- 
fore crucially depends on the absence of critical edges. 

Lemma 4.4.8 (Trade-off Lemma for LFEM^,). 

Vn e TV, Ip & LChangef V m € succ{h). ^A2-Delayed^ 

Proof. The =^-direction is almost trivial, as it is a direct consequence of 
the definition of A2-Delayed and Lemma 4.4.7(4a). In contrast, the proof 
of the 4=-direction is more intricate. Therefore, let us consider a program 
point m G succ{h) with ^A2-Delayedf^.'^ Since ip G there is a path 

p G P[s,m] on which A2-Delayed is terminated by the LChange predicate, 
i.e. there is an index j < \p\ such that the following two properties are 
satisfied: 

LChangOp. (4.20a) 

Vj ^ fc < |p|. Al-Delayed^^ A ^Al-Latestp^ (4.20b) 

Now we are going to show that the LChange predicate stays true after posi- 
tion j, i. e. 

Vj ^ fc < |p|. LChange^^ 

This is proved by means of an induction on k — j. 

Induction Base: k — j = 0. 

Trivial due to Assumption (4.20a). 

Induction Step: k — j > 0, k < \p\. 

Here we aim at constructing an irreducible subset S C such that 

G neigh).{S).^ In this case ip G neighj.{T^{0f'^^^)) follows from Lemma 
4. 3. 1(2), which is a synonym for LChangCp^. 

Let us consider the following sets of expressions: 

T ^=1? nezgh,_,{T\efSf)) n 
S neigh,_,{T) n 

^ Note that p G particularly excludes that h = e, which guarantees the 

existence of such a successor. 

® For the sake of simplicity we write of 0 ^b) q'vL) place of 0^^*^ and 
respectively. Vertices of the bipartite graphs involved are identified with 
their associated expressions. In order to make the underlying bipartite graph 
explicit we use an indexed notation neighi^_i and neigh)., respectively. 
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Then we are going to show that S defines an irreducible subset of . 
To this end we first prove: 

A) S' c 

B) neigh k{S) C . 

Starting with the proof of A) let tp € S. Since SupExpr Up) ^ 0 is 
ensured by construction, it is sufficient to show 

^ UsedLateTp^ 

Exploiting that p G holds we obtain ^ 17sedLoter^_ By the 

definition of UsedLaterp^_^[<P^^] the proof of ^ UsedLoter^ can be 
accomplished by showing that -^Al-Earliestp^ holds. This, however, is 

a trivial consequence of the definition of S that grants p G and 

Lemma 4.4.7(4b). 

For the proof of B) let G S and tp G neigh f.{p). First we show 

-^Al-Earliest^^ (4-21) 

Like in the proof of Inclusion (A) we have -^Al-Earliestp^, which in turn 
also forces ^Al-Earliestp^, since otherwise UsedLaterp^_^[<P^^] would be 
valid in contradiction to p G ■ Then we have: 

^ G 

[Definition =1* Al-Delayedf^ 

[Property (4.21)] Al-Delayed^^ A ^A\-Earliest^^ 

\Deis. Delayed k. Latest] ^ Al-Delayed^^_^ A^A1-Latest^^_^ 
[Definition 0'^}.’'^] ^ G 

Based on Condition (A) we may investigate properties of S when consid- 
ered at program point pk- 

C) neigh j.{S) D T 

D) S is an irreducible subset of 

Starting with the proof of (C), the definitions of S and T obviously imply 
neighj^_i{S) D T. Since S C (according to (A) ) and T C (by 

definition) this carries over to program point pfc yielding (S') D T. 

For the proof of (D) Lemma 4.3. 1(1) and the definition of S deliver 
D’) S is an irreducible subset of 
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Because (B) implies that a tightness defect of S at pk would also be a 
tightness defect of S at Pk-i, (D) is then an obvious consequence of (D’). 

We finish the proof by showing: 

E) G nez5/ife(TT(0^P(*))) 

Using the induction hypothesis we have LChange^^__^, which stands as 
a synonym for ip G . Moreover, Assumption (4.20b) 

delivers ip G Together we have ip € T. Using (D) and Lemma 

4. 3. 1(2) we obtain S C T^(6>^“*-*^). Thus using Condition (C) this also 
implies neigh U T, which finally yields LChangCp^ as de- 
sired. □ 

Analysing the proof of Lemma 4.4.8 we recognise: 

Remark Lemma 4.4.8 can alternatively be formulated with existential 

in place of universal quantification, i.e. 

V'0 G LChangef 3 m G succ{h). ^A2-Delayedf^ 

Before we are going to state the proposed lifetime optimality result we shall 
investigate some important properties on lifetime ranges. In particular, the 
following lemma is devoted to results expressing lifetime ranges in terms 
of upper and lower trade-off candidates. Whereas the first part refers to 
an arbitrary computationally optimal expression motion, the second part 
addresses the special situation under LFEM^,, where Trade-off Lemma 4.4.8 
can be employed in order to yield a particularly simple characterisation. 

Lemma 4.4.9 (Lifetime Range Lemma for LFEM^,). 

Let EM^, G COEM.^ such that EM,j,<t =LFEM,p<i and h G N. Then we have: 

7. V (p G 0^^^\ h G SLtRg{EM^ 4 i,ip) 3 ip G neigh{ip). -Correct^ 

2. a) y Ip G h G S LtRg , Ip) LChange^ 

b) y ipG 0T^"l h G SLtRg{LFEM^^i,ip) ^ p (p T^{0^^^^) 

Proof of 1: Starting with the ^-direction let us assume ip G such 

that h G SLtRg{E¥i^^i,ip). As p G 0^^'’^ implies ^UsedLater)f[<l>^^] we 
can exclude a strictly later original replacement of p. Thus the definition 
of SLtRg(EFl^^i, p) yields that there is an expression ip G SupExpr^i{p) such 
that p is used for the initialisation of this superexpression. That means there 
is a path p G P[h,e] and an index 1 < j ^ |p| with 

EV[^-Insertp. A V 1 < A: ^ j. —~EKi,-Insertp^ 
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Using the general assumption for EM^ this can be rewritten as 

EK^-Insert^. A V 1 < A: ^ j. -^LFEM^- Ins ert^^ (4.22) 

Now we are going to show that this also implies 

A V 1 < A: ^ j. ^Al-Earliestp^ (4.23) 

Suppose, there would be an index 1 < A; ^ j such that Al-Earliestp^ holds. 
Then Lemma 4. 2. 5(3) implies that Al-Delayed for ip is terminated at least 
at Pj+i- Hence according to Lemma 4.4.4 LFEM^> must have a (^-insertion on 
(pk, ■ . ■ ,Pj) in contradiction to Property (4.22). 

Furthermore, using Lemma 4.4. 5(1) we have Al-Delayedp^. According to the 
definition of the Al-Delayed predicate program point pj must be preceded 
on every path from s to pj by a program point that satisfies Al-Earliest . 
Suppose, such a position would be situated on the path (pi, . . . ,pj) strictly 
behind fi, i.e. there is an index 1 < A; ^ j such that Al-Earliestp^ holds. 
Then this together with Property (4.23) delivers 

Al-Earliestp^ A VI < ^ ^ A:. Al-Earliest p^, 

which therefore means C/sedLoAerV)^^®] in contradiction to the assumption 
p G Hence we have 

V 1 ^ A; < j. Al-Delayedp^ A ^Al-Latestp^ 

In particular, this means ip G and thus ip G neigh{p). On the other 

hand, we also have -^EFl^- Correct^ , as otherwise EM^> would have two ip- 
insertions on an interval of delayable program points in contradiction to 
Lemma 3. 4. 2(2). 

For the proof of the 4=-direction we shall use Lemma 3. 4. 2(2) which yields 
that the assumption -^EH,p-Correctf together with Delayed^ implies that 
there is a path p G P[h, e] and an index 1 < j ^ |p| with 

EK^-Insertp^ A V 1 ^ A: < j. Delayedp^ A ^Latestp^ 

Since ip G and following Lemma 4.4.4 this can be strengthened towards 

EK,p-Insertp^ A V 1 ^ A: < j. Al-Delayedp^ A -^Al-Latestp^ (4.24) 

This almost establishes the desired result h G SLtRg (EM,^ , p ) : EM^,- Correct^ 
holds by Lemma 4. 4. 7(3) and in addition pj of Property (4.24) defines a po- 
tential use-site of h,^. In order to ensure that pj is indeed a use-site of it 
is left to prove that EM,^ has no other p-insertions in between h and pj, i.e. 

V 1 < A; ^ j. -^En,p- Ins ertp^ (4-25) 
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To this end Property (4.24) gives us 

1 < k ^ j. Al-Delayedp^ A ^Al-Earliestp^, 
which directly implies Property (4.25) according to Lemma 4. 4. 5(2). 

Proof of 2a: We are first going to show 

y if G LFEM^- Correct^ LChange^. (4-26) 

We start by proving the contrapositive of the =i>-direction, which is the more 
interesting one as it uses the non-trivial direction of the Trade-off Lemma 

4.4.8. 



[Lemma 4.4.8] 
[Lemma 4.4.7(4a)] 
[Def. A2-Delayed] 
[Lemma 4.4.4] 



-^LChangef 

3m G succ{h). A2- Delayed^ 

3 m G succ{h). A2-Delayedf^ /\^Al-Earliestf^ 

A2-Delayedf 

^LFEM^,- Correct^ 



For the 4=-direction we have 



L Change'll 

\if G ^ LChangef A Al-Delayedf A -^Al-Latestf 

[Lemma 4.4.8] 

y m G succ{h). ^A2- Delayed'^ A Al-Delayedf A ^Al-Latestf 
[Lemma 4.4.4] =4« LFEM^- Correct^ 

It is immediate from Lemma 4.2.7 that the =i>-direction of Proposition (4.26) 
already implies the =^-direction of Part (2a). For the 4=-direction we shall 
exploit that Lemma 4.4.7(2a) delivers UsedLater^[<P^^]. Hence this and the 
4=-direction of Proposition (4.26) yield 

LFEM<f- Correct^ A UsedLaterf[<P^^], 
which according to Lemma 4.4.6(2a) implies h G S'LtBg (LFEM^^i , if). 

Proof of 2b: Here we succeed by using the results of the previous parts: 

^ g r^(e^“W) 

[Definition LC/ian(;e] y if G neigh {(f) . L Chang ef 

[Proposition (4.26)] y if G neigh{ip). LFEM^<t- Correct^ 

[Part (1)] 44> h i SLtRg{hFEn^^i,ip) □ 
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Remark 4-4-2- The assumption of the first part of Lemma 4.4.2, (p G 
can be weakened. Analysing the proof we actually only need the requirement 
~^UsedLater!^[<l>^^]. Hence we have for EM^, G COSAd^, with EM^<i =LFEM,j,<i, 
h G N and ip G 

^UsedLater‘^['l>^'^] 

(h G SLtRg{EM^^i,p) 3ip G SupExpr^nj,(i){p). -^EM^^i-Correctf) 

With all these results we are finally able to prove the proposed result on 
inductive lifetime optimality: 

Theorem 4.4.3 (Lifetime Optimality Theorem for LFEM^,). 

LFEM^, is inductively lifetime optimal. 

Proof. The proof has to consider two cases according to the definition of 
inductive lifetime optimality (cf. Definition 4.2.4). 

z = 0: In this case lifetime optimality is guaranteed by the lifetime optimality 
of LEM<f , which is applied to the 0-level expressions. 

z > 0: Let us consider a structured expression motion EM^, G COSM<i> such 
that 



EM^,<i = LFEM^,<i (4.27) 

In order to prove LFEM,j,<i we have to show for every h G N the 

inequation: 

\{pG^^^ \fiGSLtRg {LFEM^^,,p)}\ ^ 

\{p G I h G SLtRg{EM^4i,p)}\ 



To keep the notation simple let us fix h G IV and introduce an abbreviation 
scheme for the set of expressions in being involved in a lifetime range 
at h with respect to an expression motion TR. For instance, stands 

for those expressions in <P^ whose associated lifetime ranges with respect to 
EM,f<i cover h, i.e. 



={(/3 G # I h G SLtRg{E}\^4,i,p)} 



A 



With this notion Proposition (4.28) reads as: 

Starting point of the proof is a decomposition of lifetime ranges: 



A- 



<i 






— 4* 






i±) a; 






■"$<i 



(4.29a) 
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A 



LFEM 






/I* l+l /1<* 



\ (^LraM 



■4,<i 



\^iSEM^^i) (4.29b) 



%<i 



^LFEM^<i characterise “new” lifetime ranges due to t-level com- 
putations. While and 4lLFEM^<i refer to the “old” lifetime ranges of 

expressions with levels lower than i, i.e. the lifetime ranges as they appear 
from the point of view when considering the transformations being restricted 
to the universe some of these lifetime ranges are obsolete due to trade- 
offs with z-level lifetime ranges, which is captured by means of the sets 

and 4l^FEM”<i ) respectively. 



For these sets we collect a number of properties that are mainly based on the 
results on lifetime ranges given by Lemma 4.2.7, Lemma 4.4.6 and Lemma 



4.4.9: 



Ai \ ^Mi) _ 


A^ \ (9’^p(0 


(4.30a) 


/4<z,rm \ ^dn(z) 

\ — 


A<i,rm \ ^dn( 2 ) 

^^LFEM^<i \ 


(4.30b) 


neigh^A^^.nef^^"^) C 




(4.30c) 


,<i,rm „ ^dn(i) _ 

^AFEM,p<i ' ' '^h — 




(4.30d) 


Starting with Equation (4.30a) 


let !/> £ dill,,, \ e,'*’’. 


Then the definiton 



of Alyf delivers h G SLtRgiEM^^i,^). According to Lemma 4.4.6(2b) this 
implies both -Correct^ and UsedLater^[<P^^]. Using Lemma 4. 4. 5(1) 
and Lemma 4.4.4 together with ip ^ the condition -Correct^ 

implies LEM,j,<i- Correct^. The other way round we have LFEM, 5 <i- Correct,^ by 
Corollary 3.4.1, and Lemma 4.4.6(2a) finally delivers h G SLtRg {LFEK^^i , ip) . 
The U-inclusion is by a symmetric argument. 



For the proof of Equation (4.30b) we argue as follows 



[t] = {(fi e \ ^UsedLater!f[(!>^'‘] A SupExpr^Mi){<f) = ^} 

[(4.27)] = {(fi e \ ^UsedLaterPP[<P^''] A SupExpr^up(i){ip) = ^} 

M _ q<i,rm \ ^dn(i) 

[+J — ClLFEM^<i \ 



Considering Equation (f ) let us first take a look at the C-inclusion, where we 
have 



[Definition A^^. ] 



[Def. 






tp G A 



<2,rm 

EM^<i 




h G SLtRg{EK^<i,ip) A h ^ SLtRg{EK^^i,ip) 
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[Lemma 4.2.7] ^ EH^-Correct^ A h ^ S LtRg (EK^^i , ip) 

[Lemma 4.4. 6(2a)] =4> ^UsedLater)f[<l>^^] 



This together with the definition of also immediately excludes that p 

has a superexpression in 

For the 3-inclusion of (f) we may directly argue with Lemma 4. 4. 9(1) 
in its variant as stated in Remark 4.4.2. Finally, for Equation (|) the same 
reasoning as for Equation (f) applies. 

For Inclusion (4.30c) let us consider p G ng)dn(i) ^ ^ neigh{p). The 

fact that p ^ grants that Lemma 4. 4. 9(1) becomes applicable yielding 

E¥i<p-Correctf . Due to Lemma 4.4.7(2a) we have in addition UsedLaterf[<l>^'‘]. 
Putting both results together Lemma 4.4.6(2a) delivers h G SLtRg{EK^^i,tp) 
as desired. 



Finally, the C-inclusion of Equation (4.30d) is trivial, while for the 3- 
inclusion Lemma 4.4.7(2b,3) yields 



EE^-Correct'f A UsedLater'f[<l>^^], 
which by Lemma 4.4.6(2a) means p G 

The key properties, however, are the ones that close the gap between optimal 
trade-offs and lifetime ranges. Here Lemma 4. 4. 9(2) gives us: 

n = nezgh(T^(0T^‘^)) (4.31a) 

= r^(^?f*^) (4.31b) 

Now we may calculate: 



[Equation (4.29a)] 

= I^EM‘^<i I + I “ I 

[Splitting of and 



= I^kmUI 


+ I^EM^^i 




n e7<->l 








1 A <i,rm 


n 




[Equations (4.27) & 


(4.30a) & 


(4.30b)] 








= I^LFEM^<i 


1 + I^LFEM, 


/A 


+ k^EM 




n of‘>l 




/I A<i,rm \ ^dn(i) 

lK-^LFEM^<i \^n 


+ k^EF 


2,rm 


n ef '“> 


[Rearranging terms] 












= I^LFEM^<i 


1 + I^LFEM, 


/A 


1 A<i,rm 

KrLFEM^< 








I^EM, 




n 
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[Inclusion (4.30c)] 
[Theorem 4.3.3] 



- (I4t<. n - |ne*5M4t<” n 



+ I^LPEM,,. \ \ ef^^\ 

- - \netgh{T\0f^%\) 



[Equations (4.31a,b)] 



= I4<pW.. I + I^lpem,<. \ I - \ 



idn(i) I 



\ 4<pW<.- I - n 



Equation (4.30d)] 

= I^iSem.,< 



\Ai \ (a“P(*)| _ |/t<*’™ \ (9° 

KTpEM <i \ '^h \ KTpEM.*<i \ '^n 



-(|A 



<2,rm 

LFEM.,, 



n 6>! 



^dn(i) 



"4i<i 

- I^Lpe 






,dn(i) I 
h \ 



[Merging of and 

= I^LFEM^<i I + I^LFEM^^i I ~ I^LFEM^<i I 

[Equation (4.29b)] 

= 



□ 



4.4.5 Computing LFEM^ 

As in Section 3.5, where the equation systems for individual busy and lazy 
expression motion were given, this section aims at providing a brief algorith- 
mically oriented summary of the steps of LFEM^>. 

Table 4.1 gives an overview of the general structure of the algorithm. In fact, 
the algorithm is essentially a refinement approach, i. e. all the analyses that 
are associated with LEM<f are computed just as before. Each of the two adjust- 
ment steps only requires one additional data flow analysis that is triggered 
levelwise. 

The detailed steps that are associated with the first and the second refine- 
ment step are summarised in Table 4.2 and 4.3, respectively. It should be 
noted that both delayability adjustments in Table 4.2(4) and Table 4.3(3) 
use the alternative characterisations given in Lemma 4.4. 1(3) and Lemma 
4.4.8, respectively. Finally, the practical implementation (as opposed to the 
theoretically oriented presentation earlier) only requires trade-off information 
that is restricted to exit points of nodes. This is due to the fact that a prof- 
itable trade-off whose origin is at the entry of a node remains profitable also 
at the exit (cf. Lemma 4.4.8). 
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1. Replacement Points (ip € $): 




LFEM<p- Replace(( '= ip G MaxSubExpr^{ip^^) where is RHS expression 

at n 




2. Relevant Global Analyses (ip € $): 




Computation of all relevant global predicates of LEMj. according to Table 3.1 
and Table 3.2: 

- NDnSafe((, XDnSafe^ , NUpSafe((, XUpSafe* 

- NDelayed((, XDelayed(( 




3. Insertion Points wrt. <I>^ (ip G <I>^ ): 




— LFEn^- NInsert(( *== N-Latest^ 

— LFEMj>-A/nsert,f X-Latest(( 




4- Insertion Points wrt. (i > 0); 




For each level i > 0 in increasing order do 

a) Perform the first adjustment as described in Table 4.2. 

b) Perform the second adjustment as described in Table 4.3. 

c) Determine the insertion points {ip £ #*): 

- LFEn^-NInsert:^ = A2-NLatestS 

- LFEn^- Xlnsert* = A2-XLatestt 





Table 4.1. Skeleton of the LFEMj.- algorithm 



4.5 Fully Flexible Expression Motion 

Although LFEM^ is a major step forwards in capturing trade-offs between life- 
times of temporaries, we have to be aware of the fact that the inductive notion 
of lifetime optimality (cf. Definition 4.2.4) is weaker than the “full” notion 
of lifetime optimality (cf. Definition 4.2.3). In this section we will present, 
how even full lifetime optimality can be reached efficiently. In contrast to the 
inductively progressing LFEM^>, where bipartite graphs are natural to model 
the relation between the expressions of the actual and lower levels, the gen- 
eral problem can be expressed more adequately in terms of expression DAGs. 
Fortunately, the situation which seems to be more complex at first glance can 
be reduced to the bipartite case. 

In fact, this way the algorithm for the computation of tight sets of a bi- 
partite graph still provides the algorithmic kernel of the full method. The 
section is organised as follows. First we give a classification of expressions 
at a program point that are relevant for the general trade-off problem. After 
discussing limitations of LFEM,^ we introduce a more general view of profitable 
trade-offs between the lifetimes of temporaries and show how to reduce the 
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1. Correctness Analysis (<fi £ SubExpr^{<!>')): 

NCORRECTX = LFEM,j,-7V/nsert^ + 

{ false if n = s 

XCORRECT^ otherwise 

mGpred(n) 

XCORRECT;f = LFEM,j,-X/nsert^ + Transp!^ ■ NCORRECT^ 
Greatest fixed point solution: LFEM^<i-iVGorrect and LFEM.j<i-XGorrect 

2. Adjusting Down-Safety (if £ No data flow analysis!) 

A-NDnSafet = NDnSafe^ ■ n CFm^<i-NCorrect:^ 

<pGSubExpr^('4>) 

A-XDnSafet = XDnSafef) ■ LFEM.j<.-XGorrecC 

ipGSubExpr^ (•?/)) 

3. Computation of Earliestness: (if £ No dataflow analysis!) 

Al-NEarliestt =" A-NDnSafe)j( ■ 

{ true if n — s 

^ XUpSafe!)) + A-XDnSafefj) otherwise 

mGpred{n) 

Al-XEarliest)) A-XDnSafe)j( ■ Transp)) 

Table 4.2. (Part 1) Computing the first adjustment of LFEMj. with respect to level 
i > 0 

general situation to the well-known problem of computing tight sets of bi- 
partite graphs. A condensed presentation of the results of this section can be 
found in [Riit98b]. 

4.5.1 A Classification of Expressions 

Among the expressions of <P we distinguish between four categories of expres- 
sions with respect to a program point h. These four classes evolve from two 
dissections made on the set of expressions: the first dissection separates used- 
later expressions from release candidates, where the former ones are those 
whose associated temporaries are definitely used at a later program point 
and the latter ones are those whose temporaries can possibly be released. 













4. Optimal Expression Motion: The Multiple-Expression View 



4- Adjusting Delayability (Y> £ No data flow analysis!) 




Al-NDelayed* = A-NDnSafe* ■ NDelayed* 
Al-XDelayed* = A-XDnSafe)j) ■ XDelayed)^ 




5. Adjusting Latestness: (ip € No data flow analysis!) 




Al-NLatesti)) Al-NDelayed^ ■ Comp)^ 

Al-XLatest)) = Al-XDelayed)^ ■ ^ Al-N Delayed))^ 

mGsucc{n) 





Table 4.2. (Part 2) Computing the first adjustment of LFEMj. with respect to level 
i > 0 



The second combination, register expressions vs. initialisation candidates, 
models, whether the associated temporaries are definitely initialised before 
or if we can choose between the options to initialise the temporary now or 
later. 

In order to give a formal definition of these classes we have to modify 
the used-later predicate of Section 4.4.1 (see page 61). As we now no longer 
proceed levelwise, the variant of the predicate considered here can be deter- 
mined at once for all expressions and can be completely founded on BEM,^. 



UsedLater'f e P[h, e] 3 1 < j ^ |p|. 

{bEVl,!,- R eplace p. V 3^/> G SupExpr,p{ip). Earliestp.'^ A 

y I < k ^ j. ^Earliestp^ 

Now the formal criteria for the proposed classification are given below, to- 
gether with graphical attributes that are assigned to each class of expressions. 
These attributes are used in order to label the vertices of an expression DAG 
that presents the subexpression relation among the classified expressions at 
a program point. Shaded symbols are used for register expressions, while un- 
shaded ones stand for initialisation candidates. Moreover, expressions whose 
associated temporaries are definitely used later are represented by a square, 
while release candidates are represented by a circle. Thus the formal classifi- 
cation for an expression ip G <d> and a program point h G N is: 

Register expression & Used-later expression: 

Formal condition: LEM^,- Correct,^ A UsedLater!^ 

Attribute: ■ 
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Register expression & Release Candidate: 

Formal condition: LEM^- Correct^ f\ ^UsedLaterf 
Attribute: • 

Initialisation candidate & Used-later expression: 

Formal condition: Delayed^ A ^Latest^ A UsedLater^ 

Attribute: □ 

Initialisation candidate & Release Candidate: 

Formal condition: Delayed^ A -^Latest'f A -^UsedLater^ 

Attribute: O 

Note that in contrast to the inductive approach the used-later predicate as 
well as the correctness predicate can be determined completely in advance. It 
should also be noted that the used-later predicate is based on BEM<|>, while the 
correctness predicate is based on LEM<j . This way the transformation can ex- 
ploit information about the full universe which particularly makes the new 
approach sensitive to superexpressions of any level. Using the computational 
optimality of LEM<j it is easy to see that the formal conditions for register ex- 
pressions and initialisation candidates are mutually exclusive. Therefore, the 
classification of an expression is uniquely determined. Figure 4.12 shows a la- 
belled expression DAG® associated with the exit of node 2, where expressions 
of all four kinds can be observed. 



V 




’ Fig. 4.12. Classification of expressions 

Obviously, Lp\ and (p 2 are register expressions, since their latest initialisation 
points precede the exit of node 2. The only initialisation relying on 
is the one associated with the composite expression ^pi * which may be 
placed already at the exit of node 2. Hence is a release candidate. On 
the other hand, has a strictly later use at node 5, where an original 
occurrence of p 2 has to be replaced, which makes ip 2 a used-later expression. 
Complementary, both initialisations associated with ipi * ip 2 and with * 

® A formal definition is given in Section 4.5.3. 
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</32 + are initialisation candidates as their corresponding initialisations can 
be delayed across the exit of node 2. However, only an initialisation with 
respect to ipi*ip 2 + niust be used strictly later at node 4, while its earliest 
initialisation point at the exit of node 2 makes a release candidate. 



4.5.2 Limitations of Inductive Lifetime Optimality 

Before we are going to give a precise notion on general trade-offs between 
lifetimes of temporaries, we briefly discuss the limitations of the inductive 
notion of lifetime optimality as this helps to elucidate the additional power 
of the general approach. 

The most striking deficiency of LFEM,^ as a method that only aims at in- 
ductive lifetime optimality is its missing up-sensitivity, i. e. it only captures 
trade-offs between the current level and levels already processed, but can- 
not recognise “future opportunities” of higher levels. Figure 4.13(a) shows a 
labelled expression DAG as it could possibly be associated with a program 
point h. The point of this example is that we can get rid of three lifetimes 
associated with the 0-level expressions at the cost of introducing one initiali- 
sation with respect to the unique large expression of level 3. All initialisations 
associated with expressions of levels 1 and 2 only induce local lifetimes of their 
associated temporaries. LFEM^>, however, would not recognise this opportu- 
nity, as the trade-off balance of three 0-level lifetimes against four 1-level 
lifetimes is negative. 

A special case of this effect shows up for the expressions of level 0, which 
by the levelwise approach were assumed to be placed best by means of lazy 
expression motion, because no trade-offs with expressions of lower levels can 
be exploited. However, even this can be suboptimal as depicted in Figure 
4.13(b). In this figure the two operand expressions of type • can be traded 
against the expression of type □, since the operand of type O only causes a 
local lifetime. However, the inductive approach treats the expressions of level 

0 uniquely by means of lazy expression motion. Thus trade-offs between level 

1 and 0 are only considered, if level 0 solely consists of operands of type • . 



4.5.3 General Trade-Offs 

Labelled Expression DAGs With the previous classification of expres- 
sions the general trade-off situation between the lifetimes of temporaries can 
be expressed quite naturally in terms of conditions on expression DAGs. 
Formally, the labelled expression DAG associated with a program point h 
is a triple Dn = {0h,En,(-h), where On is the subset of expressions in ^ 

Note, however, that as soon as one of the initialisations associated with a 1- 
level expression gets forced to be placed in a (mediate) successor of h, all other 
initialisations would be enabled, because there are no non-profitable intermediate 
trade-offs anymore. 
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3 

2 

1 

0 

level 



b) 




Fig. 4.13. Limitations of inductive lifetime optimality a) Missed trade-off due to 
intermediate degradation b) Missed trade-off due to non-register-expression as 0- 
level operand 



that can be classified according to one of the four criteria of Section 4.5.1, 

■ On {•, ■, O, □} is the corresponding labelling of the vertices in On 
and En On On is the set of directed edges defined by: 

e En ^ inW G {0,D} A tp e SubExpr Q^{ip) 

Because subexpression relations among register expressions are irrelevant, En 
can be considered as the relevant part of the (immediate) subexpression re- 
lation. This view is particularly justified, since the first point of the following 
lemma ensures completeness. 

Lemma 4.5.1 (Expression DAG Lemma). 

Let Dn = {On, En,£n) the labelled expression DAG with respect to fi € N . 
Then we have the following properties: 

1. Wif G On- (niif) G V(^ G SubExpr^{ip). ipGOn 

2. W if G On- inlv) G {0,D} ^ VV' G SupExpr 0 ^{if). ifii'ip) G {0,0} 

3. y (p G On- (-h{p) G {•,■} ^ ip is a leaf node of Dn 

Proof Part (1) is just a reformulation of Lemma 4. 2. 5(1). For the proof of 
Part (2) let us assume if G SupExprQ^{p) such that f'rt(V') G {•,■}. This 
means that LEM<f- Correct^ holds. According to Lemma 4.2.6 this induces 
LEM<f- Correct^, too, which, however, would mean that £h{p) G {•,■} in 
contradiction to the assumption. Finally, Part (3) is immediate from the 
construction of En- □ 

Optimal Trade-Offs in Labelled Expression DAGs Now the trade- 
off problem for a given labelled expression DAG Dn = {On, En,£h) can be 
formulated. To this end let us first consider a formalisation of profitable 
trade-offs between register expressions and initialisation candidates: 
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Definition 4.5.1 (Trade-off Pair). with Q On is a 

trade-off pair at h, if and only if the following conditions are satisfied: 

1. is a set of register expressions, i. e. : \f ip G Sjf. £n{p) G {•,■} 

2. Sjf is a set of initialisation candidates, i. e. : Vip G ifii'f’) G {0,D} 

3. is covered by Sjf, i. e. : y p G S”^®, ip G predDf,{p)- V’ € Sjf 
4- S'!® is well- structured (cf. Definition 4-^-V> *• 6- 

y -tjj G Sjf y p G succD^i'ip)- £n(p) G {0,D} ^ p G Spf 

Informatively, defines a set of register expressions whose registers can 
(partly) be released at the costs of introducing new lifetime ranges associated 
with the initialisation of large expressions in Eijf. To be more specific we 
define the relevant register expressions among a set of expressions <P by: 

y^ C On. = {pG^\ £n(p) = • } 

and the relevant initialisation candidates among a set of expressions <l> by: 

v<? c On. {p G Epf I £n{p) = □ V 

G SupExprQ.{p). 

With these notions the difference |7?.^®(S'lj®)| — \R)f{Spf)\ defines the gain of 
the trade-off pair (S'!®, S'!®). This is motivated by the fact that expressions 
in are actually setting free an occupied symbolic register, since 

Condition (3) of Definition 4.5.1 ensures that all their superexpressions are 
properly initialised, while those in TZjf(Spf) start up a new lifetime range at 
h, which is based on the fact that initialisations of temporaries belonging to 
expressions of type □ always induce a lifetime range that comprises h. On 
the other hand, an initialisation of a temporary that belongs to an expression 
of type O has a local lifetime range, if and only if all its superexpressions are 
initialised at h as well. 

Hence our optimisation problem is: 

Problem 4.5.1 (Determining an Optimal Trade-off Pair). 

Instance: A labelled expression DAG Df, = {On, En,£h). 

Goal: Find an optimal trade-off pair (S"^®, Ejf), i. e. one with maximal gain. 

The notion is illustrated by means of Figure 4.14(a), which shows a labelled 
expression DAG with respect to a program point h. Figure 4.14(b) shows 
the optimal trade-off pair that is given by ({1, 2,4, 5}, {6, 7, 8, 9, 10, 12, 13}). 
Actually, it is even the only trade-off pair that achieves a positive gain in this 
example. The relevant part of Sjf is TZ]f{S]f) = {8, 12, 13}. It is worth noting 

This condition is necessary in order to ensure that subexpressions are always 
computed before the expression itself, which is a basic requirement of structured 
expression motion (cf.Definition 4.2.1). 
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that the vertex 8 is of type O. Nonetheless, its initialisation is more profitable 
than initialisations with respect to its superexpressions, since this would force 
to include two additional relevant initialisation candidates, namely vertex 
11 and 3. The vertices 6,7,9 and 10 do not impose costs, since they only 
represent internal calculations. 




Fig. 4.14. a) A labelled expression DAG b) An optimal trade-off pair 



Reduction to Bipartite Graphs At first glance Problem 4.5.1 appears 
to be harder than the problem of finding tight sets of bipartite graphs (cf. 
Problem 4.3.2). However, the conditions imposed on the structure of labelled 
expression DAGs and trade-off pairs allow to reduce labelled expression DAGs 
to bipartite graphs in a way that the optimisation problem can be solved by 
means of the matching based technique of Algorithm 4.3.1. 

Algorithm 4.5.1. 

Input: Labelled expression DAG Dn = {0n,Eh,^h)- 
Output: Induced bipartite graph B„ = (0^ l+l 0^ ,E?). 

1. Set 0f {:p &0n\ e {•,0}} and 0^^ {(p & 0n \ Ihip) G 

{0,D}}. For expressions p belonging to both 0^ and 0^^ we assume 
that distinct copies are created in order to keep 0^ and 0'^^ disjoint. 

2. Set E? I <p G 0^ A tjj G 0'^ A t/j G smccJ,^ (pred£)„ ((/?))} 

Essentially, the algorithm puts release candidates into the partition 0"^ and 
initialisation candidates into 0'^- In particular, it is worth noting that 0„- 
vertices of type O are common to both levels. Informatively, this ensures 
that non-relevant initialisation candidates in 0^*^ can be compensated by 
corresponding release candidates in 0^“^. 

Some edges of E^ are directly induced by corresponding edges of En. 
However, in order to project the internal structure of D„ onto the two levels 
of Bn an edge {ip,tp} with ip G 0'^ additionally induces corresponding edges 
between ip and the 0^^-elements in the successor-closure of ip. 
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Remark 4^. 5.1. Obviously, and 0'^ can be written as: 

= {(p G ^ \ ^UsedLater^} 

~ ^ I Delayed^ A ^Latest'f} 

This formulations are almost completely along the lines of the definition of 
and on page 61, which makes it quite easy to carry over major 

parts of the reasoning on trade-off candidates to the new situation. Parts 
where differences become essential, however, are elaborated in full details. 

The central result emphasising the role of the reduction of Algorithm 4.5.1 
is: 

Theorem 4.5.1 (Reduction Theorem). Let Dn = {Oh, En,ih) be a la- 
belled expression DAG associated with h G N and let B={0'^ l±) 0'j^,E?) 
be the corresponding bipartite graph (computed by means of Algorithm 4-5.1). 
Then we have: 

1. is an optimal trade-off pair of D„, then the subset of Off 
defined by TZff{Eff) l±) {Sjf \ TZ}f{E:jf)) is a tight set. In particular, its 
deficiency^^ is equal to the gain of{Eff,Sff). 

2. If Off'^ C Off is a tight set, then {TZff {Off ^), neigh {O'^^)) is a trade-off 
pair of Dh, whose gain coincides with the deficiency ofOff^. 

Proof. It is sufficient to show: 

A) For every trade-off pair {Eff,Eff) the set l±l {Eff \ Tl)f{Eff)) 

defines a subset of Off that has a better or the same deficiency as the 
gain of {ElfEff). 

B) Every subset Off^ C Off induces a trade-off pair {R-^{Off^), neigh{Off^)) 
whose gain is at least as good as the deficiency of Off'^. 

Let us start with the proof of (A). With Off^ =* TZff{Eff) l±) {Eff\TZ)f{Eff)) 
we first have 

neigh{OT) C (4.32) 

To show this inclusion we shall first exploit that Definition 4.5. 1(3) ensures 
that predDf,{p) Eff, if is a register expression in TZff{Eff), while the 
definition of relevancy yields the same, if tp is an irrelevant vertex in Sff \ 
TZtf{Eff). Finally, Definition 4.5. 1(4) even grants for the successor-closure 
succfi^{predDi,{’p)) H Off C Eff. Hence according to the construction of Eff 
Inclusion (4.32) is satisfied. Now we can finish the proof of (A) by calculating: 

l^ri - \netgh{OT)\ 

[Def. 0T & Inclusion (4.32)] ^ |7^^(Sr) W (^^ \ Eff{Eff)\ - iHfl 



Recall that the deficiency of a subset ^ C 6*^ is defined as |tf'| — \neigh{'I')\. 
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For the proof of (B) let us set =* neigh{0°k^^) . 

It is easy to check that is a trade-off pair, i.e. fulfils the four 

requirements of Definition 4.5.1. Point (1) and (2) hold trivially and Point 
(3) is immediate by construction. Point (4), however, is again due to the 
successor-closure that is employed in the Definition of E?. 

As we have neigh{tp) C for each tp G this particularly means 

SupExpr 0 ^{ip) C for each ip G S"t®. Hence according to the definition of 
relevant initialisation candidates we get by the fact that all ip G 0°^^ \ S't® 
are of type O: W ip G \ S't®. ip ^ TZ)^{E'^). In particular, this induces 

the following inclusion: 

(4.33) 



- IS),® \ {0T \ SDI 
is-i-(isi®i-(i0ri-i^ri))i 
\^T\-\neigh{OT)\ □ 

We illustrate the message of this theorem by means of the example of Figure 
4.14. Figure 4.15 shows the corresponding bipartite graph together with its 
unique tight set {1, 2, 4, 5, 6, 7, 9, 10}. Note that the number of edges is quite 
large due to the closure operation applied in Algorithm 4.5.1. The irrelevant 
St®-vertices of Figure 4.14, namely {6, 7, 9, 10}, are now reflected by vertices 
that are common to both levels. Informatively, this way the costs imposed 
by irrelevant upper trade-off candidates are compensated by corresponding 
lower trade-off candidates, which can be considered pseudo-operands. 



7^}®(Hr) c -i®\(0r\^D 

Now we can calculate: 

[Definition = 

[Inclusion (4.33)] ^ 

[©f = 

[Definition S'*®] = 



4.5.4 Defining FFEM^ 

As for the counterpart LFEM^, the trade-offs by themselves only cope with the 
local situation at a given program point h. However, they can be employed for 
a global algorithm that now determines fully lifetime optimal computation 
points of expressions somewhere in between their earliest and their latest 
initialisation points. We again succeed with a globalisation that is particularly 
easy to implement, as the local trade-off information can be directly fed into 
an adjustment of the delayability predicate. Actually, the refinement process 
is even simpler than the one used for the definition of LFEM,^. This is due 




94 



4. Optimal Expression Motion: The Multiple-Expression View 




Fig. 4.15. The bipartite graph corresponding to Figure 4.14 with largest tight 
subset 

to the fact that now there is no need for a first adjustment, which in the 
definition of LFEM,j> was necessary in order to ensure that large expressions 
are initialised only at those program points where all their subexpressions 
are already initialised before. Now this condition is automatically fulfilled, as 
Part (4) of Definition 4.5.1 ensures that the premature initialisation of a large 
trade-off candidate always forces the prior initialisation of its subexpressions, 
too. 

The Transformation The key for the definition of Fully Flexible Expression 
Motion (FFEM(f) is a refinement of the delay predicate, which takes into 
account opportunities for profitable trade-offs. The local trade-off information 
at a program point ft give rise to a predicate Change‘S indicating, if ip should 
be initialised prematurely due to a profitable trade-off between lifetimes of 
temporaries. Formally, it is defined similarly to its counterpart on page 62 

Change'f ^ v? G neigh{T^ , 

where now refers to the underlying bipartite graph at ft computed by 
means of Algorithm 4.5.1. Then we have 

A-Delayedf @ Vp G P[s, ft] 3 1 ^ f ^ |p|. Earliest!^. A 

V* ^ J < bl - ^Comp^. A ^Changep. 

Again the adjusted delayability predicate induces a corresponding predicate 
for latestness A-Latest^, which finally determines the insertion points of 
FFEM.I. . Thus we have for any p G <1> and ft G IV : 

FFEK,p- Replace!^ ^ p G MaxSubExpr^{p™^), 

where refers to the right-hand side expression associated with ft, and 

FFEn^- Insert!f ^ A- Latest^. 
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Similar to LFEM<|, also FFEM<|, has to be proved of being admissible, compu- 
tationally optimal and finally now also lifetime optimal. Fortunately, most 
parts of the proofs of Section 4.4 carry over quite easily to the new situation. 
For this reason we will only point to the corresponding proofs, whenever the 
reasoning is straightforward to adapt. 

4.5.5 Proving Admissibility 

Surprisingly, here the situation is simpler than it was for LFEM^>. In partic- 
ular, we get along without all of the results dealing with properties of the 
first adjustment. On the other hand, this part of LFEM^, automatically forced 
the subexpression criterion of structured expression motion. In contrast, the 
argumentation for FFEM,^ is not as straightforward. We briefly summarise the 
main results that are necessary in order to establish admissibility of FFEM^,. 

As counterparts to Lemma 4. 4. 1(1) and Lemma 4.4.2(1&2) we have: 

Lemma 4.5.2 (Delayability Lemma for FFEM,^). 

V'0 G h G N. A-Delayedf Delayedf 

Lemma 4.5.3 (Latestness Lemma for FFEM^,). 

1. W £ <P, h £ N. Latest^ FFEM^,- Correct^ 

2. W £ <P, h £ N. Latest^ W ip £ SubExpr^{tp). FFEM^,- Correct^ 

Both lemmas are proved straightforwardly along the lines of the correspond- 
ing versions for LFEM^,. 

Moreover, in analogy to LFEM^, the central characterisation of trade-offs. 
Trade-off Lemma 4.4.8, has an almost identical counterpart in terms of 
Lemma 4. 5. 4(1). Compared to LFEM,^, however, we use this result at an ear- 
lier stage, as it helps to keep the reasoning on adjusted delayability local. 
In addition. Trade-off Lemma 4.5.4 is now supplemented by a second part, 
which can be considered as a completion of the Structured Local Property 
Lemma 4.2.1.^^ 

Lemma 4.5.4 (Trade-off Lemma for FFEM^,). 

1. Wh £ N, ijj £ 0'^. Change^ V m G succ{h). ^A-Delayedf^ 

3 m £ succ{h). ^A-Delayed^ 

2. y h £ N, Ip £ Opp. Change^ y ip £ SubExprQy{pj). Changepp 



Since the trade-off information can be computed completely in advance, the 
Change predicate can be considered the third local property besides Transp 
and Comp. From this point of view the second part of the Trade-off Lemma is 
perfectly compatible with Lemma 4.2.1. 
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The second part is immediately due to Theorem 4. 5. 1(2), which grants that 
the expressions satisfying the Change predicate are exactly the initialisa- 
tion candidates of the corresponding trade-off pair, together with Definition 
4.5. 1(4) ensuring that initialisation candidates are closed with respect to the 
subexpression relation. 

Part (1) can be proved almost exactly as Lemma 4.4.8, since the char- 
acterisation of the upper and lower trade-off candidates as given in Remark 
4.5.1 allows to use the same reasoning, where now and take the role of 
and , respectively. For the second equivalence, i. e. the equivalence 
of existential and universal quantification, see Remark 4.4.1. Furthermore, we 
shall be aware that the proof of this point also requires to adapt a version of 
Lemma 4.4.7 (now restricted to parts (1) and (4)), which now reads as: 

Lemma 4.5.5 (Trade-off Candidate Lemma for FFEM^,). 

1. Wh € N, m G pred{succ{h)) . 0'^ = 0'^ 

2. a) yh G N, 'ip G 0'^, h.s G succ{h). ^Earliest^^ 

h) y fi G N , ip G neigh{0'^), hs G succ{h). ^Earliest^^ 

Again the proofs of Lemma 4.4.7 carry over straightforwardly by leaving 
out all the parts being involved with a special reasoning on the first adjust- 
ment. However, we have to be careful with the proof of Part (1), since the 
proof of Lemma 4. 4. 7(1) uses Lemma 4.4.4, a result whose counterpart is not 
available yet. Fortunately, here we succeed without such an argument. For 
succ{h) = {fis} we have: 

^G©r 

[Definition 0“*’] 

Delayed^ A ^Latestf 
[Definition Latest & Lemma 3. 4. 2(2)] 

Delayedf^ A^Earliestf^ 

[Definition Delayed & Latest ] 

^ Delayed^ A V m G pred{hs) ■ Delayed^ A ^Latest^ 
[sMcc(n) = {As}] 

y m G pred{succ{h)). Delayed^ A ^Latest^ 

[Definition 0^] 

y m G pred{succ{h)) . ip G 0'^ □ 

As the equivalent to Theorem 4.4.1 we finally get: 

Theorem 4.5.2 (Transformation Theorem for FFEM<f). FFEM^, is 

1. a structured expression motion transformation in the sense of Definition 
4-2.1 and 

2. admissible, i. e. FFEM^, G AEAi<p 
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Proof. In contrast to LFEM<f the first point is not as obvious. We have to 
show 

i) y h G N, if G <I>. FFEM<|,-/nsert^ V € SubExpr ^{if) . FFEM^- Correct^. 

Let us assume FFEM<f-/nsert^ and ip S SuhExpr ^{if) . Immediately by the 
according definitions we obtain the following trivial sequence of implications 

FFEM<f-/nsert^ A-Latestf A-Delayedf ^ Delayed^ (4.34) 

Using Lemma 4. 2. 5(1) we then obtain Delayed]^ V LEM<f- Correct^ 

a) b) 

In the case of (b) we easily conclude FFEM^- Correct^ from applying Lemma 
4.5.3(1) to the path characterization of LEM^- Correct^ . Thus we are left with 

(a) . The case when ^A-Delayedf holds is also subsumed by (b), as in this 
case the path characterizations of Delayed and A-Delayed deliver 

Delayed'f A ^ A-Delayed^ LEM<|,- Correct^ . 

A formal proof of this implication would be almost the same as the one of 
Part (1) of Lemma 4.5.3. Similarly, Lemma 4. 5. 3(1) directly grants that the 
situation when Latestf holds is also covered by (b). Thus we are left with a 
situation, where (a) reduces to 

A-Delayed'f 

Now we are going to show that A-Latestff holds, which trivially implies 
LEM^,- Correct'f 

To this end we use the fact that A-Latestf implies: 

Comp'l V 3m G succ{h). ^A-Delayedf 

c) d) 

Lemma 4. 2. 1(2) delivers that Comp'f follows from (c), which immediately 
establishes the desired result A-Latestf. So let us investigate the remaining 
case (d). Due to Lemma 4.2.6 we may assume ^Latestf, as otherwise again 

(b) would become true. This, however, means if G 0^^, which allows to apply 
Lemma 4. 5. 4(1) yielding Change'll. According to Lemma 4. 5.4(2) this induces 
Changef. Applying Lemma 4. 5. 4(1) once more we finally get A-Latestf as 
desired. 

In order to prove the second part of the lemma we have to show that initial- 
isations are safe and replacements are correct, i.e. 
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ii) V^/> G h £ N. FFEM^-Insertf => Safe^ 

iii) VV’ G <?, h£ N. FFEM^- Replace f FFEM^- Correct^ . 

Starting with Point (ii), Sequence (4.34) ensures that FFEM^>-/nsert,f implies 
Delayed^. From Lemma 3. 4. 1(1) we then get DnSafef as desired. Finally, 
the proof of (iii) is based on the fact that FFEV[^- Replace'l implies Comp^, 
which delivers LEM^,- Correct^. Again (see proof of (b) above) Lemma 4. 5. 3(1) 
also forces FFEM^,- Correct^. □ 

4.5.6 Proving Computational Optimality 

The proof of computational optimality is also straightforward along the lines 
of the corresponding proof for LFEM,^ . As the counterpart of Lemma 4.4.4 we 
get: 

Lemma 4.5.6 (Adjustment Lemma for FFEM,^). Let 'll) £ <L> and p £ P 

an interval of program points between the earliest and latest program point 
(cf. Lemma 3. 4-2(2)), i. e. (Al) Earliestp.^, (A2) Latestp^^^ and (AS) VI ^ 

k < \p\. Delayedp^ A^Latestp^. Then there is an index 1 ^ z ^ |p| such that: 

1. VI ^ fc ^ |p|. A-Delayedp^ (1 ^ fc ^ z) 

2. VI ^ fc ^ |p|. A-Latestp^ (i = k) 

This directly gives rise to the following result: 

Theorem 4.5.3 (Computational Optimality Theorem for FFEM^>). 

FFEM^, is computationally optimal, i. e. FFEM^> G COSA4,p. 

4.5.7 Proving Lifetime Optimality 

Like for the reasoning on inductive lifetime optimality it is necessary to collect 
a number of results concerned with lifetime ranges. Fortunately, all these 
results are quite close to their counterparts in Section 4.4. Starting with the 
counterpart to Lemma 4.4.6 we have:^^ 

Lemma 4.5.7 (UsedLater -'Lemma). Let EM^, G COSA4,p and h £ N . Then 

1. V z/z G . DnSafef A Transpf UsedLater f 

2. W if £<L. EKp-Correct^f A UsedLaterf h £ SLtRg{EFl,p,ip) 

As for LFEMcf the key for establishing lifetime-optimality lies in a lemma that 
captures the relationship between lifetime ranges of upper and lower trade-off 
candidates. Along the lines of Lemma 4.4.9 we have: 



Note that we do not need a counterpart of Lemma 4.4.6(2b). 
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Lemma 4.5.8 (Lifetime Range Lemma for FFEM<|,). 

Let EM<|, G COS Ai,p and h £ N . Then 

1. y (p G h G SLtRff(EM,p, (f) 

EVI 4 , -Correct^ A G predof^ip)- -^EH,p-Correctf 

2. a) y Ip G Opp \ Opp. h G SLtRg^FFEM^jpj) GP Change^ 

b) y pGOf\ ef. h G SLtRg{FFEn^, p) GP p (p 

c) y p G Opp n 0“^. h G (FFEM<|,, p) GP 

pGneigh{T^{Ot))\T^{Ot) 

Part (1) is almost identical to its counterpart Lemma 4. 4. 9(1). The only 
difference is that now the right-hand side of the equivalence has to require 
correctness with respect to p. This is for the reason that - as opposed to the 
levelwise approach - lower trade-off candidates may violate the correctness 
predicate. In fact, a straightforward adaption of Lemma 4. 4. 9(1) would not be 
true either, since under the new situation we might be faced with expressions 
of type O, whose lifetime ranges at h are only local. 

The second part of the lemma slightly differs from Lemma 4. 4. 9(2). The 
most striking difference is that the characterisation is divided into three parts 
rather than two. Again this is due to the fact that expressions of type O 
may only be associated with local lifetime ranges, which requires a separate 
reasoning in this case. 

Due to the important role that this lemma has for the proof of lifetime opti- 
mality we will elaborate its proof in full details, even though some parts are 
similar to the arguments given in the proof of Lemma 4.4.9. 

Proof of 1: Starting with the =^-direction let us assume p G Off such that 
h G SLtRg {EF[,p ^ p) . First, according to Lemma 4.2.7 this immediately im- 
plies EF[,^-Correcty . On the other hand, we have ^UsedLaterff and thus can 
exclude a strictly later original replacement of p. Hence the definition of 
SLtRg{EF[,i,,p) yields that there is an expression ip G SupExpr^i{p) such 
that p is used for the initialisation of this superexpression. That means there 
is a path p G P[h,e] and an index 1 < j ^ |p| with 

EK^-Insertp^ A V 1 < /c ^ j. ^EM^,-/nsert^ (4.35) 

First, we are going to show 

1 < k ^ j. ^Earliestp^ (4.36) 

Suppose there would be an index 1 < k ^ j such that Earliestp^ is true. Using 
that EM^, is a structured expression motion especially means EM,^- Correctp. . 
Hence according to Lemma 3. 4. 2 (2) this would imply an index k ^ I ^ j with 
EE^-Insertp^ in contradiction to the assumption in (4.35). 
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Furthermore, due to Lemma 3. 4. 2(2) EM,p-Insertp^ implies that program point 
Pj must be preceded on every program path from s to pj by a program 
point that satisfies Earliest . Suppose, such a position would be on the path 
(pi,... ,pj) strictly behind h, i. e. there is an index 1 < k ^ j such that 
Earliestp^ holds. Then this together with Property (4.36) delivers 

Earliestp^ A \f 1 < I ^ k. -^Earliestp^, 

which therefore means UsedLater^ in contradiction to the assumption p G 
Hence we have 



V 1 ^ fc < j. Delayedp^ A ^Latestp^ 

In particular, this means ip G 0'^ and thus ip G predof^ip)- On the other 
hand, we also have Correctf , as otherwise EM,^ would have two ip- 

insertions on an interval of delayable program points in contradiction to 
Lemma 3. 4. 2(2). 

For the proof of the 4=-direction we shall use Lemma 3. 4. 2(2) which yields 
that the assumption -^EH^-Correctf together with Delayed^ implies that 
there is a path p G P[h, e] and an index 1 < j ^ |p| with: 

EVicp-Insertp. A V 1 ^ A: < j. Delayedp^ A ^Latestp^ (4.37) 

This almost establishes the desired result h G SLtRg{EVl 4 ,,(p): EK^- Correct]? 
holds by assumption and in addition pj defines a potential use-site of . In 
order to ensure that pj is actually a use-site of we are left with the proof 
that EM,f has no other p-insertions in between h and pj, i.e. 



V 1 < fc ^ j. —•EK^-Insertp^ (4.38) 

To this end, let us assume an index 1 < A: ^ j with EM<|,-/nserA^ . Due 
to our assumption EVi,p- Correct? then Lemma 3. 4. 2(2) implies that there 
must be an index \ < I k such that Earliestp^ holds. This, however, is in 
contradiction to Lemma 4. 2. 5(3) which is applicable, since Property (4.37) 
forces Delayedp^ A ^Earliestp^. 

Proof of 2: Investigating the proof of Part (2a) we start by providing two 
auxiliary properties. First we have for upper trade-off candidates: 



y Ip G FFEM^,- Correct^ Change]^ 



(4.39) 
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Starting with the contrapositive of the =^-direction we have 

Change^ 

[Lemma 4.5.4] Vm G succ{h). A- Delayed^ 

[Lemma 4.5. 5(2a)j W m G succ{h). A- Delayed^ A ^Earliest'^ 

[Definition A- Delayed] A- Delayed]^ A ^A-Latestf 

[Lemma 4.5.6] ^ ^FFEM<f- Correct^ 

The 4=-direction is similar: 

Change^ 

[V> G 6*“'’] Change^ A Delayed^ A ^Latest^ 

[Lemma 4.5.4] 

y m G succ{h). ^ A- Delayed^ A Delayedf A ^Latestf 
[Lemma 4.5.6] FFEM<|,- Correct^ 

The second auxiliary property addresses the lower trade-off candidates: 
y If G 0^. If GT^{0^) V^/> G prec?D„ ((^). FFEM<|,- Correct^ (4.40) 

Here we have 

y G T^{0f) 

[Del. Change] y tp G neigh{(p). Change^ 

[Prop. (4.39)] Vi/' G neigh{ip). FFEM<|,- Correct^ 

[(t)] "tA y ^p G predonif)- Correct^ 

The equivalence marked (f) requires a short explanation. Whereas the 
forward-implication is obvious, for the backward-direction let us assume an 
expression ip G predD„{p) such that FFEM,^- Correct,f holds. Due to Theo- 
rem 4.5.2 Lemma 4.2.4 becomes applicable yielding FFEM,^- Correct^ for all 
(fi' G SubExpr%{tp). Hence due to the definition of edges in the associated 
bipartite graph (cf. Algorithm 4.5.1) this ensure that FFEM^>- Correct^ even 
holds for Ip G neigh{ip). 

Now we can turn to the proofs of Part (2a) to (2c). The =4>-direction of 
Part (2a) is a direct consequence of the forward-implication of Proposition 
(4.39) together with Lemma 4.2.7. For the 4=-direction we can exploit that 
pj G 0'pp\0^ delivers UsedLater^ . This together with the backward-direction 
of Proposition (4.39) implies 

FFEM,f- Correct^ A UsedLaterf , 

which according to Lemma 4. 5. 7(2) yields h G SLtRg(FFEM 4 ,,'ip). 
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For Part (2b) we have 



[Proposition (4.40)] 
[Part (1) with ip e \ ©“'’] 



V V' G pred,D„ (p) ■ FFEM^>- Correctf^ 
44> h & SLtRg{YYEV[,i,,ip) 



Finally, let us investigate the new point (2c) whose proof is also quite simple 
using the Propositions (4.39) and (4.40): 



[Def. LChange] 
[Prop. (4.39 & 4.40)] 

[Part (1)] 



p G nezgh{T\Of))\T^{ef) 

^ Change f K ip i 
44> FFEM^,- Correct,f A 

3zp G predu^^ip). -^FFEn,p- Correcti^ 
4A h G SLtRg{FFEn^,if) □ 



Now we finally succeed in proving the main result: 

Theorem 4.5.4 (Lifetime Optimality Theorem for FFEM,|,). 

FFEM^, is lifetime optimal. 



Proof Let us consider a structured expression motion EM^, G COEM$. In 
order to prove EM^> FFEM^> we have to show for every h G N the inequation: 

\{ip G<P\h G SLtRg{FFEV[.j,,ip)}\ ^ 

\{ip G ^ 1 h G SLtRg{EF[,i,,ip)}\ 



For the sake of a simple notation let us fix h G IV and introduce the following 
abbreviations: 

=* {ip G <P\h G SLtRg(m^,ip)} 

= {p ^ d>\h G S LtRg (FFEH.P , ip)} . 

Furthermore, for both sets the subsets of expressions of type {■, □, •, O} are 
indicated by a corresponding superscript out of {■,□,•, O}, e.g., 

{ip G AYi^.^\^n{sp) = B}- Before we turn to the estimation on the number of 
lifetime ranges at h we shall collect some elementary properties. We start 
with propositions characterising lifetime ranges being out of the scope of 
trade-offs: for instance, expressions not in On do not contribute to lifetime 
ranges at h at all and lifetime ranges of expressions of type ■ are independent 
from the expression motion under consideration: 

C On (4.42a) 

^IpFEM,*, C On (4.42b) 

= ^*EMs (4.42c) 
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Starting with Inclusion (4.42a) let us assume ip G /Iem^ • According to Lemma 
4.2.7 this implies EM^- Correct^ , which due to Lemma 3.4. 1(2) means either 
LEM<f- Correct.^ or Delayed'^. As Latest'^ is subsumed by LEM^>- Correct^ , this 
can be even strengthened towards 

LEM<f- Correct^ V {Delay ed'f A ^Latest{f), 

which by definition is equal to tp G On - Inclusion (4.42b) follows from an anal- 
ogous argument. For the proof of Equation (4.42c) let us assume p G 
By definition this means LEM^- Correct^ and UsedLater'f. As LEM<f- Correct^ 
implies FFEM^- Correct^ (cf. Lemma 4.5.6) we obtain using Lemma 4. 5. 7(2) 
that p G a" . The inclusion A* C A^lj^ follows in a symmetric way. 

The central ingredients of the proof, however, are some propositions that 
close the gap between lifetime ranges and trade-off pairs. With 




and 

=* {pGOn\^'<pG U A°^. p G smccJ). (V>)} 
these propositions read as: 

(S'f®,S'l‘^) is a trade-off pair in Dn (4.43a) 

7^-(H^) = = Of\{0'^^ U Ag*J (4.43b) 

5= Agf,^ l±l A^^ (4.43c) 

Considering Proposition (4.43a) conditions (1), (2) and (4) of Definition 4.5.1 
are trivially satisfied. For the remaining Point (3) we have to show that a 
register expression p G S'!® is covered by S'!®. Let i) G predD^i^p)- Ac- 
cording to the definition of S'!® Lemma 4. 5. 8(1) becomes applicable yielding 
EH^-Correct{^ . In the case that ip is of type □ Lemma 4. 4. 6(2) then yields 
%l> G A^^ . Otherwise, if ip is of type O we either have ip G A^^ , in which case 

we are done, or ip ^ A^^, which establishes the same situation as before with 
■ip taking the role of p. For the reason that expressions of maximum level in 
O'^^ satisfy property UsedLater (by Lemma 4. 4. 6(1)) we eventually succeed 
in showing tp G succ*jj^{ip') for some ip' G A^^ U A^^. 

Because Opp \ O^p only contains expressions of type A*g„^ , Equation (4.43b) 
is trivial. 

For Inclusion (4.43c) let us assume p G 7l]p{Stp) and show that p G A^^ l±) 
A^is • the case that p is of type □ we have by definition 
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ISM.fp-Correct'f A UsedLater^f , 
which by Corollary 3.4.1 implies 
EK^-Correct!f A UsedLater'f 

Hence by using Lemma 4.5. 7(2) we succeed in showing ip G In the re- 
maining case that p is of type O let us assume that p ^ 
the same argument as in the proof of Property (4.43a) delivers that all pre- 
decessors in predonip) are part of too, which would be in contradiction 
to the relevancy of p. Hence we obtain p G as desired. 

Then we finally obtain the following sequence of inequations: 

I 

[Equation 4.42(a)] 

= n 0n\ 

[Splitting Hem.*. A ©a] 

= 

[Equation 4.42(c)] 

= I + I^EM^i I + I^EM^i I + I^FFEM^i I 

= \Of \ Of\ - ((|0f \ - |T*J) - (|hOj + |T° J)) + 

[4* \©r, 4?^nd°^=0] 

= \Cn-((|0f \(0rU^Et)l)-(l4^. W^E°mJ)) + I4"fEmJ 

[Equation 4.43(b) & Inclusion 4.43(c)] 

^ \of \ eri - - iK^(^r)i) + i4"femJ 

[Proposition 4.43(a) & Theorem 4.5.1 & Theorem 4.3.3 ] 

^ \Of \ Of\ - {\T^{Of)\ - \nezgh{T\Of))\) + 

[Decomposing T^{0^) & neigh{T^{&^)) ] 

= \of \ 0f\ - (\r\ef) \ 0 f\ + \T\0f) n of\ - 

\nezgh{T^{Of)) n - \nezgh{T^{Of)) \ Of\) + [4^1 
[Combining rT(©f) n ©“P & neigh{T^{0f)) n 0f] 

= \of\0f\-\T\ef)\0f\ + 

\{nezgh{T^{Of)) \ T^{Of)) n Of n Of\ + 
\nezgh{TfOf))\Of\ + \A*^^J 
[Lemma 4.5.8(2a,b,c)] 

= \ ©,,,^1 — |©^“ \ {Of U Hppgf,^)! + [HpppK^I + 

= \ ©r I - I (Of \Of)\ HpV I + I^F°EM. I + I^F°FEM. I + I^f"fEM. I 
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C of \ Of] 



— I^FFEMjil 



I^FFEMii I + 1^1 



□ 

FFEMji 



I^FFEM^i I + I^FFEM^i I 

+ I^FFEMs I 



[Combining /Iffem^ C Of 
= l^lpFEM^ r\ 0 ft 
[Equation 4.42(b)] 

= j^FFEMjil 



1^1 



FFEM^i I 



4.5.8 Computing FFEM^ 

Like in the section on LFEM<|, we will complete this section with the presen- 
tation of the algorithmically oriented summary of the steps of FFEM<|,. 

Table 4.4 gives a survey on the general structure of the algorithm. Again the 
algorithm is a refinement approach which is based on the original analyses 
belonging to BEM<|, and LEM^. However, the absence of a counterpart to the 
first adjustment step of LFEM^> and the general kind of trade-offs (as opposed 
to the levelwise approach) makes the description of FFEM^, significantly sim- 
pler. In fact, essentially LEM^, has only to be supplemented by a preprocess 
for computing the new local CTion^e -property, which requires only one ad- 
ditional data flow analysis that can even be done in a bit-vector fashion. 




106 4. Optimal Expression Motion: The Multiple-Expression View 



1. Used-Later Analysis (ip € SubExpr^{<l>')): 



NUSEDLATX 


= Al-X Earliest!^ ■ 




1 


[ Al-XEarliestf^ 


-h XUSEDLATX ) 




\ ‘ ^ 

ij^GSupExpr^i (ip) 




XUSEDLAT^f 


= Al-NEarliestf( ■ ^LFEM<f-7?ep/ace^ -I- 

mGsuccin^ 


Al-NEarliest((i -I- Compfjf 


-F NUSEDLAT^ ) 


4’GSupExpr^l (<p) 


ip£SupExpr^-^i (ip) 




Least fixed point solution: NUsedLater[d>^'] 


and XUsedLater[d>^'] 



2. Computation of 0^^*^ , for exit point h of n 

a) Determine 0^^*^ by means of point 1). 

b) Determine by means of point 4) & 5) of Table 4.2. 



3. Computation o/T^(0^^*^) 

Determine by means of Algorithm 4.3.1. 

LChange^ =* if G neigh{T^{&’^^'‘^)) 



4- Adjusting Delayability (if G ‘L>'‘ , No data flow analysis! Compute 

Compute A2-NDelayedf before A2-XDelayedf ) 



A2-NDelayed^ 


=* Al-XDelayed!^ ■ LChangef( 

mGp'red(n) 


A2-XDelayedt 


= A2-NDelayedi ■ Comp^ 


5. Adjusting Latestness: 


(if G No data flow analysis!) 


A2-NLatestn 


A2-NDelayed)j( ■ Compf^ 


A2-XLatestf^ 


=" A2-XDelayedi ■ A2-NDelayed)l( 

mGsucc{n) 



Table 4.3. Computing the second adjustment of LFEMj. with respect to level i > 0 
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1. Replacement Points (ip € <P): 

FFEK^- Replace^ ■(/) £ MaxSubExpr^((p^) 

where is RHS expression at n 

2. Relevant Global Analyses (p £ <P): 

Computation of all relevant global predicates of LEMj> according to Table 3.1 
and Table 3.2: 

- NDnSafe^, XDnSafe^ , NUpSafe^^, XUpSafe!^ , NEarliestX , XEarliestX 

— NDelayed!^ , XDelayed^ , N-Latest^ , X- Latest!^ 

3. Adjustment of Delayability: 

Adjust the delayability predicate as described in Table 4.5. 

4- Insertion Points (ip £ d>): 

Determine the insertion points by: 

FFEK4,-NInsertif A-NLatest^ 

EEEW.4,-XInsertif A-XLatestX 



Table 4.4. Skeleton of the FFEMj.- algorithm 
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1. Used-Later Analysis (ip € <P): 



NUSEDLATjf = 


X Earliest)) ■ 




( XEarliestt + XUSEDLAT^ ) 




•0 e SupExpr^ (if) 


XUSEDLATX = 


'Y, NEarliest)) ■ 




mGsucc{n) 


(^FFEM4.-EepZace;^ 


-h Y NEarliesC + NUSEDLAT^ ) 




il^GSupExpr^ (ip) 


Least fixed point solution: NUsedLater and XUsedLater 



2. Computation of 0^, 0'^ for exit point h of n 

a) Determine 0^ by means of Point 2) of Table 4.4. 

b) Determine by means of Point 1). 

c) Determine edges between 0ff and 0'^ : 

y ifi € 0T, V’ G {‘fiyi’} £ 4^ i> € SubExpr*0ita{SupExprQ^y{ip)) 

3. Computation of T^{0ff) 

Determine T^{0'ff) by means of Algorithm 4.3.1. 

-N-+ Changef^ =* p € neigh{T^ {0'^)) 



4- Adjusting Delayability (<p e No data flow analysis! Compute 
A-NDelayedf^ before A-XDelayedjf) 



A-NDelayedf) 


=* XDelayedf) ■ Change)) 




m^pred(ri) 


A-XDelayed)) 


=* A-NDelayed) ■ Comp) 



5. Adjusting Latestness: (ip £ <L>, No data flow analysis!) 



A-NLatestf) = A-NDelayedf) ■ Compf) 

A-XLatestf) A-XDelayedf) ■ A-NDelayedj) 

7nGsucc{n) 



Table 4.5. Computing the adjustment step of FFEMj. 
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4.6 The Complexity 

In this section we will discuss the complexity of the multiple-expression mo- 
tion approaches presented in this chapter. In particular, we will see that 
LFEM^, and FFEM,|, only impose moderate additional costs. 



4.6.1 The Complexity of BEM^ and LEM^ 

The complexity of BEM,^ and LEM,^ can be obtained straightforwardly by ap- 
plying Theorem 3.6.1 of Section 3.6. Hence: 

Theorem 4.6.1 (Complexity of BEM,^ and LEM^,). BEM,^ and LEM,^ can he 
performed with run-time complexity of order C(|G| |<?|) for both flat and struc- 
tured sets of expressions <P. 



4.6.2 The Complexity of LFEM^ 

Let us now investigate the worst-case time complexity of LFEM^>. Obviously, 
the “old” global data flow analyses addressed in Table 4.1(3) can be solved in 
order C(|G| |^|) as before. The same also applies to the refinement steps men- 
tioned in Table 4.2(4 & 5) and Table 4.3(4 & 5) and the new global correctness 
analysis of Table 4.2(1). 

The refinement step of Table 4.2(2) and the new global used-later analysis 
of Table 4.3(1), however, refer to expressions of both and for a 

level i under consideration. However, it is easy to see that both steps are of 
order for each level i and exit point h. Summarised over 

all levels and nodes of the flow graph this amounts to^® 

^(E E ^ ^(1^1 1'^iE ^ ^(1^1 

hGN i^O 

If arities of operators are assumed to be bound by a constant, then the 
estimation for each level reduces to |), which results in G(|G| |<?|) 

for the complete analysis. 

Finally, we have to examine the costs of step (3) of Table 4.3, which 
computes the local trade-off information at a node. As mentioned on page 
53 the computational complexity for determining a maximal matching of a 
bipartite graph {V,E) is of order G(|y |2 |f;|). Algorithm 4.3.1 which com- 
putes the largest tight set of a bipartite graph {S l±) T,E) is subsumed by 
this estimation, since under the assumption that a maximal matching is al- 
ready computed in advance the algorithm terminates within a bound of order 
0{\E\). This is due to the fact that processing a vertex in R at most requires 

Note that X] ^ l*^!- 



15 
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to inspect all its neighbouring vertices.^® Hence for each level i and each exit 
point h Step (3) of Table 4.3 takes time of order 

Summarising this result over all levels and nodes of the flow graph the overall 
computational complexity of step (3) of Table 4.3 is^® 

fiGN 

^ o(|G| ) 

i^O 

^ o(iGi mi J2 E ) 

i^O i^O 

^G(|G| mi) 

In the case that the arities of operators are bound by a constant this estima- 
tion can be improved towards G(|G| l^l^), as the levelwise estimation could 
be reduced towards *^|) which is a consequence 

of the fact that the number of edges between and are of order 

Summing up, we have the following result: 

Theorem 4.6.2 (Complexity of LFEM^,). 

1. LFEMif can be performed with run-time complexity of order G(|G| |<^|^) 
for any structured set of expressions <P. 

2. LFEMcf can he performed with run-time complexity of order G(|G| 

for a structured set of expressions <P, where arities of operators are bound 
by a constant. 

4.6.3 The Complexity of FFEM^ 

For the worst-case complexity of FFEM,^ a similar argumentation as for LFEM,|, 
applies. Again the overall complexity is subsumed by the complexity for the 
computation of the local trade-off information, i. e. Step (3) of Table 4.5. 
Here, the only significant difference is that we cannot exploit boundedness 
of operator arities anymore, since the construction of edges between Off 
and 0“^ in Table 4.5(2c) introduces additional edges reflecting also mediate 
subexpression relations. Thus we obtain: 

Marking processed vertices one can easily achieve to include a vertex at most 
once in D. 




4.6 The Complexity 111 



Theorem 4.6.3 (Complexity of the FFEM<|,). 

FFEM<j, can he performed with run-time complexity of order C(|G| for 

any structured set of expressions <P. 

4.6.4 Bit- Vector Complexity of BEM^, LEM^, LFEM^ and FFEM^ 

Besides this pure complexity estimations a serious amount of research has 
been put into the development of techniques that are tuned for the simul- 
taneous treatment of all entities under consideration. These techniques that 
are known as bit-vector techniques are based on the paradigm that a whole 
bit-vector operation is considered as an elementary step. This assumption is 
caused by the fact that bit- vector operations can be implemented reasonably 
fast for moderately sized bit- vectors on most machines. Therefore, the intent 
of bit- vector algorithms is to exploit the structure of the flow graph in a way 
that the complexity for solving a problem for all objects simultaneously is 
almost the same as the complexity for solving the problem for one particular 
object (under the assumption of elementary bit- vector operations). In fact, a 
number of algorithms exist whose complexities are almost linear in the num- 
ber of nodes of the flow graph. The superlinear behaviour is only reflected by 
a factor which in most cases can be considered reasonably small in practice. 
In details the most important techniques are: 

— Iterative techniques like that of Kam an Ullman [KU76, KU77] whose run- 
time complexity is of order 0(|G| d), where d, the depth of the flow graph, 
is determined by the maximum number of back edges that can occur on 
an acyclic program path. 

— Node listing techniques like that of Kennedy [Ken75], whose complexity is 
shown in [AU75] to be of order 0{\G\ log(|G|)) for reducible control flow 
[Hec77]. 

— Elimination techniques as summarised in the survey paper of Ryder and 
Pauli [RP88] reaching also the bound of G(|G| log(|G|)) for reducible con- 
trol flow.^^ 

— Path compression algorithms like that of Tarjan [Tar79, Tar81b, Tar81a] 
yielding the famous G(|G| a(|G|))-bound for irreducible control flow, where 
a is a function that grows slower than the functional inverse of Ackermann’s 
function. 

In particular, even in the case of unrestricted control flow we have according 
to [KU76, KU77]: 

Theorem 4.6.4 (Bit- Vector Complexity of BEM^, and LEM^,). 

Both BEM^, and LEM<f can he computed by means of global data flow analyses 
that stabilise within d -F 1 round robin iterations}^ 

For arbitrary control flow some of these show up a pathological exponential 
behaviour. 

With the nodes of the flow graphs sorted topologically according to the postorder 
or reverse postorder. 




112 4. Optimal Expression Motion: The Multiple-Expression View 



Unfortunately, neither LFEM^, nor FFEM^> can take full advantage of bit- vector 
analyses, since the trade-off algorithms are beyond the scope of bit-vector 
operations. However, from a pragmatical point of view, most of the analyses 
of LFEMcf and even all of the analyses of FFEM,^ can be handled with bit-vector 
techniques. In fact, as mentioned FFEM^, only differs from LEM^, by the non- 
bit-vector postprocess computing local trade-offs and adjusting predicates. 
Therefore, an efficient implementation can take advantage of the available 
bit- vector potential as well as possible. 

The results of this section are summarised in Table 4.6. This table gives a 
good impression, how the computational complexity of different techniques is 
reflected in the order of the second parameter, which may vary from almost 
constant in the bit-vector case up to order 2^ for the flexible approaches. 





pu 

comp 

arities of 
bound 


ire 

lexity 

operators 

unbound 


bit- vector 
complexity 


BEM,;,, LEMs. 


0(\G\ |<?|) 


o{\G\m) 


0(|G|d) 


LFEMs. 


0{\G\ |4^|t) 


0(\G\mi) 


/ 


FFEMs. 


o{\G\ mi) 


o(\G\mi) 


/ 



Table 4.6. Worst case time complexities of expression motion: A classification 





5. Expression Motion in the Presence of 
Critical Edges 



It is well-known since Morel’s and Renvoise’s [MR79] pioneering work that 
critical edges may cause serious problems for expression motion: 

— First, the lack of suitable placement points usually leads to suboptimal 
results, i. e. results that are strictly worse in terms of the number of com- 
putations than competitive results that could be obtained in a flow graph 
after splitting critical edges. Figure 2.2 of Chapter 2 gives an illustration 
of this phenomenon. 

— Second, the equation system of Morel and Renvoise uses bidirectional data 
flow analyses. In fact, the bidirectionality of their algorithm became model 
in the field of bit-vector based expression motion (cf. [Cho83, Dha88, 
Dha89b, Dha91, DK93, DRZ92, DS88, JD82a, JD82b, Mor84, MR81, 
Sor89]). Bidirectional algorithms, however, are in general conceptually and 
computationally more complex than unidirectional ones. In particular, crit- 
ical edges prevent the application of fast unidirectional bit- vector methods, 
which in the case of reducible control are almost linear in the program size 
(cf. Section 4.6.4). In contrast, the best known estimations for bidirectional 
bit-vector analyses are of order 0(|Gp) (cf. [Dha91, DK93, DP93]). 

In this chapter we are going to provide a systematic approach to expression 
motion in the presence of critical edges. To this end, we investigate the im- 
pact of the presence of critical edges for both the single-expression view and 
the multiple-expression view. Whereas the “classical” deficiencies as sketched 
above all address the single-expression view, the multiple-expression case has 
not been previously explored. Surprisingly, we found that the difficulties aris- 
ing in the latter case are more serious than the “classical” ones. 



5.1 The Single-Expression View 

In this section we are going to examine the impact of critical edges on defi- 
nitions and algorithms presented in Chapter 3 for the single-expression view 
of expression motion. To this end, we are first going to provide a fresh and 
conceptually clean view on the phenomena causing the need for bidirectional 
analyses. Afterwards, we will show how to avoid bidirectionality completely 
by enriching the flow graph by (virtual) shortcut edges. 



O. Ruething: Interacting Code Motion Transformations, LNCS 1539, pp. 113-131, 1998. 
© Springer- Verlag Berlin Heidelberg 1998 
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Throughout this section we adapt the definition and notions of Chapter 
3. Moreover, like in Chapter 3 the presentation is developed for a fixed flow 
graph G that is now assumed to be part of iJCcrit and a fixed expression 
pattern ip. 

5.1.1 Computational Optimal Expression Motion 

As opposed to flow graphs without critical edges there are usually no com- 
putationally optimal expression motions operating on iJ0crit- In fact, Figure 

5.1 shows two admissible, but ;i^“+p^-incomparable expression motions that 
cannot be improved any further. The first one is simply given by the identy 
transformation of the program in Figure 5.1(a), the result of the second one 
is displayed in Figure 5.1(b). Each of the resulting programs has exactly one 
computation on the path that is emphasized in the dark shade of grey, while 
having two computations on the path being emphasized in the light shade 
of grey, respectively. Thus there is no computationally optimal expression 
motion with respect to the original program in Figure 5.1(a). 





Fig. 5.1. Incomparable admissible program transformations 



However, in the example above the solution of Figure 5.1(b) should be ex- 
cluded as a reasonable expression motion, since this transformation would 
require to increase the number of computations on some program path. Es- 
sentially, this is caused by the fact that the initialisation at node 2 is not used 
on the program path through node 3 and 6. To this end, we restrict ourselves 
to expression motions that are profitable, i. e. that introduce initializations 
only when they are actually needed on every program path originating at 
the initialisation site. This particularly ensures that such a transformation is 
computationally better than the identity transformation. 

Definition 5.1.1 (Profitability). EM^ S A£M.ip is profitable if and only 
if every intialization is used on every program path leading to the end node, 
i. e. 

EK;p-Insert_^ Vp £ P[ft,e] 3i ^ \p\. Replace A 

V 1 < 7 ^ i. -^E¥i,n-lnsert 

Pj 
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Let us denote the set of profitable expression motions in ASM^p by VAEM^,. 
Obviously, profitability is guaranteed for computationally optimal expression 
motions out of COEMp,. Hence this condition does not impose a further 
restriction for flow graphs without critical edges, where computationally op- 
timal expression motions are granted to exist. In the presence of critical edges, 
however, the restriction to VAEMip is necessary in order to yield computa- 
tionally optimal results at all. Then we may use almost the same definition 
for computational optimality as in Chapter 3 (cf. Definition 3.2.1) in order 
to define COEM^^^, the set of computationally optimal expression motions 
operating on 5©crit- As mentioned, the only essential divergence is that the 
underlying universe now is VAEAi^ instead of AEM^,. 



5.1.2 Busy Expression Motion 

Unfortunately, busy expression motion as presented in Section 3.2 cannot be 
adapted to flow graphs with critical edges, since it may not define a profitable 
expression motion. Even worse, a naive adaption may even cause a program 
degradation as illustrated in Figure 5.2. In this slight modification of Figure 
5.1 the marked range of down-safe program points depicted in Figure 5.2(a) 
would yield earliest initialization points at nodes 1 and 5 leading to the 
program displayed in Figure 5.2(b). This, however, introduces an additional 
computation on the path leading through node 1 and 5, while no path at all 
is strictly improved. 




Fig. 5.2. Program degradation through a naive adaption of bnsy expression motion 



The key for a reasonable definition of busy expression motion on flow graphs 
with critical edges is to impose a homogeneity requirement on down-safety 
that ensures that the information propagates either to all or to none of its 
predecessors, which grants that earliest program points become a proper up- 
per borderline of the region of safe program points. In fact, in the absence of 
critical edges down-safety has the following homogeneity property: 

\/fi G N. DnSafe^ ^ (\/ m G pred{h). Safe^) V 
iy m G pred{h). ^DnSafCj^) 
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Note that the first term of the disjunction uses safety rather than down-safety, 
since propagation of down-safety to predecessors that are up-safe anyhow is 
not necessaryd Now this property has to be forced explicitly. For instance in 
Figure 5.2(a) the entry of node 5 as well as the exit of node 2 are down-safe, 
while the exit of node 3 is not. Therefore, let us consider the following notion 
of homogeneous safety: 

Definition 5.1.2 (Homogeneous Down-Safety). A predicate HDnSafe 
on the program points of N is a homogeneous down-safety predicate, if and 
only if for any fi G N 

1. HDnSafe is conform with down-safety: 

HDnSafCj^ Compf^ V 

((h yf e) A Transpj^ A V m € succ{h). HDnSafe^) 

2. HDnSafe is homogeneous: 

HDnSafCf^ {W m G pred{h). HDnSafe^ V UpSafe^) V 
(V m G pred{h). ^HDnSafe^) 

Obviously, homogeneous down-safety predicates are closed under “union”. ^ 
Thus there exists a unique largest homogeneous down-safety predicate de- 
noted by Hom-DnSafe , which gives rise to a homogeneous version of safety, 
too: 

y h G N. Horn-Safe^ Hom-DnSafCf^ V UpSafe^^ 

Earliest program points are then defined along the lines of Definition 3.2.2. 
Definition 5.1.3 (Homogeneous Earliestness). For any h G N 

Horn- Earliest = Hom-DnSafe^^ A 

((h = s) y 3m G pred{h). -^Transp^ V ^Hom-SafCj^) 

Then busy expression motion for flow graphs with critical edges (CBEM,^) is 
defined as follows: 

— Insert initialisation statements h^p := tp at every program point h 
satisfying Horn-Earliest . 

— Replace every original occurrence of (p hy \\<p. 

As an equivalent to Theorem 3.2.1 we obtain: 

Theorem 5.1.1 (Optimality Theorem for CBEM,^). 

CBEM,^ is computationally optimal within VASAip,, i. e. CBEM,^ G COEAi^^^. 

^ In the absence of critical edges this makes no difference to requiring V m G 
pred{h). DnSafe^. 

^ This means the predicate defined by the pointwise conjunction of the predicate 
values. 
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5.1.3 Lazy Expression Motion 

Similar to the situation in Section 5.1.2 also the relevant analyses of LEM^ as 
defined in Chapter 3 cannot naively be adapted to flow graphs with critical 
edges. This even holds, if the appropriate busy expression motion, that is 
CBEM,^, is assumed as the basis for the delay process, which shall be illus- 
trated by means of the example in Figure 5.3. Figure 5.3(a) already shows 
the result of CBEMa+f,. A naive adaption of lazy expression motion would 
determine delayable program points as emphasized in the figure. Thus ini- 
tializations at latest program points would yield a program as displayed in 
Figure 5.3(b). Note, however, that this transformation increases the number 
of computations of a -I- 5 on the path (1, 3, 5, 8, ... ) compared to CBEMa+b. 



h 


a ; 

= 








/ 









a := 

h .•= a+b 



4| h 



81 h 



9 h 



,nV 



3 |h .•= a+b\ 



5 [ 



a 


= 






h .= 


a+b 

1 



h .'= a+b 
8 h 



9 h 



V 

Fig. 5.3. Program degradation through a naive adaption of lazy expression motion 



Again the reason for this behaviour lies in a homogeneity defect, but now 
with respect to delayability. In fact, for flow graphs without critical edges we 
have 

Delayedf^ => m G succ{h). Delayed^) V (V m G succ{h). ^Delayed^) 

This property may now be violated due to the presence of critical edges. For 
instance, in Figure 5.3(a) both the exit of node 3 and the entry of node 5 
satisfy delayability, whereas the entry of node 6 does not. Hence this homo- 
geneity property has to be forced explicitly in order to obtain a reasonable 
adaption of lazy expression motion to 5©crit- Therefore, let us consider the 
following notion of homogeneous delayability. 

Definition 5.1.4 (Homogeneous Delayability). A predicate HDelayed 
on N is a homogeneous delayability predicate, if and only if for any h G N 

1. HDelayed is conform with delayability: 

HDelayed.^ +G- Horn- Earliest f, V 

((fi yt s) A V m G pred{h). HDelayed.,^ A —‘Comp^) 
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2. HDelayed is homogeneous: 

HDelayedj^ (V m € succ{h). HDelayed^) V 
{y m G succ{h). ^HDelayed^) 

Obviously, homogeneous delayability predicates are closed under “union”. 
Thus there exists a unique largest homogeneous delayability predicate that 
shall be denoted by Horn-Delayed . This gives rise to a new notion of latestness 
defined along the lines of Definition 3.3.4: 

Definition 5.1.5 (Homogeneous Latestness). For every h G N 

Horn-Latest^ Hom-Delayedf^ A 

{Compf^ y 3 m G succ{h). ^Horn-Delayed,^) 



Then lazy expression motion for flow graphs with critical edges (CLEM,^) is 
defined as follows: 

— Insert initialisation statements h^p := ip at every program point h 
satisfying Horn-Latest . 

— Replace every original occurrence of ip hy \\<p. 

Using the same definition for lifetime optimality as in Chapter 3 (cf. Definition 
3.3.1) we succeed in proving lifetime optimality along the lines of Theorem 

3.3.1. 

Theorem 5.1.2 (CLEM,p-Theorem). 

CLEM,p is lifetime optimal within the universe . 



5.1.4 Computing CBEM;^ and CLEM;^ 

In this section we are going to present how busy and lazy expression motion 
can actually be computed for flow graphs with critical edges. In fact, the 
analyses that are presented are the first proposals for computationally and 
lifetime optimal expression motion in the presence of critical edges. As an 
achievement of our systematic approach to critical edges we even succeed in 
giving two alternative approaches: (1) A classical approach via bidirectional 
analyses and (2) a new non-standard approach that transforms the problem 
into a unidirectional form. 

5. 1.4.1 The Classical Approach: Bidirectional Analyses. As in Sec- 
tion 3.5 the analyses associated with busy and lazy expression motion are 
summarized in form of two tabulars. The most significant differences with 
respect to their counterparts in Table 3.1 and Table 3.2 are: 
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1. Safety Analyses: 

a) Up-Safety Analysis See Figure 3.1 

b) Homogeneous Down-Safety Analysis (with respect to Up-Safety) 



NDNSAFE„ = Comp„ -h Transp^ ■ XDNSAFE„ 




XDNSAFE„ = 






false 


if n = e 




n NDNSAFE^ • 

7nGsucc{n) 






n XDNSAFE„, -h XUpSafe„, 

n' ^pred(m) 


otherwise 


Greatest fixed point solution: Hom-NDnSafe and Hom-XDnSafe 



2. Computation of Homogeneous Earliestness: (No dataflow analysis!) 



Hom-N Earliest^ 


Hom-NSafe^ ■ 


1 


( false if n = s 




1 

1 n ^UpSafe^ -\- Hom-XDnSafe^ otherwise 




^ mGpred(n) 


Hom-XEarliest^ 


Hom-XSafe^ ■ Transp^ 



3. Insertion and Replacement Points of the CBEM^p-Transformation: 



CBEVlp-NInsert^ 


def 


Hom-NEarliestn 


CBEKip-XInsert^ 


def 


Hom-XEarliest„ 


CBEK^- Replace^ 


def 


Comp„ 





Table 5.1. Computing CBEM^^: The bidirectional approach 



— Homogeneity of safety and delayability is forced explicitly by the usage of 
bidirectional equations, i. e. equations whose right-hand side values depend 
on information of both predecessors and successors. 

— Homogeneous down-safety depends on up-safety. To this end the safety 
analyses have to be performed sequentially.^ 



® A similar phenomenon shows up for semantic expression motion, even in the 
absence of critical edges [KRS96a]. 
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1. Perform steps 1) and 2) of Table 3.1. 

2. Delayability Analysis: 



NDELAYED„ 


= Hom-NEarliestn + 


1 


( false 


if n — s 




1 n XDELAYED„ ■ ]! NDELAYED„, otherwise 




mGpred(n) 


n' Gsucc(m) 


XDELAYED„ 


= Hom-XEarliest„ + NDELAYED„ • Comp^ 


-N.+ Greatest fixed point solution: Hom-NDelayed and Hom-XDelayed 



3. Computation of Latestness: (No dataflow analysis!) 



Hom-NLatestn = Hom-NDelayed^ ■ Comp^ 



Hom-XLatest„ Hom-XDelayed„ ■ Hom-NDelayed^ 

7nGsucc{n) 



4 . Insertion and Replacement Points of the CLEH^p-Transformation: 


CLEK^-NIn.sert^ 


def 


Hom-NLatest„ 


CLEKip-XInsert^ 


def 


Hom-XLatest„ 


CLEMip- Replace^ 


def 


Comp^ 





Table 5.2. Computing CLEM^,: The bidirectional approach 



However, in contrast to competing bidirectional data flow analyses proposed 
for expression motion [Cho83, Dha88, Dha89b, Dha91, DK93, DRZ92, DS88, 
JD82a, JD82b, Mor84, MR81, Sor89] the approach presented here succeeds 
in carrying over the main advantages of the non-critical versions of Chapter 
3: 



— The hierarchical construction of lazy expression motion on the basis of busy 
expression motion yields a clean separation between the primary optimisa- 
tion goal, the elimination of partially redundant expressions, and lifetime 
considerations. 

— The approach is still structurally significantly simpler than for instance, the 
original proposal of Morel and Renvoise [MR79]. In fact, the conceptual 
view of bidirectionality as an instrument to establish homogeneity leads 
to a well-understood and sparse usage of this feature. The latter aspect 
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particularly helps to avoid equations with unnecessarily intricate mutual 
dependencies.'^ 

5. 1.4. 2 The Alternative Approach: Unidirectional Analyses. An im- 
portant observation with respect to the bidirectional equation systems occur- 
ring in Table 5.2 is that information at a node can be influenced by informa- 
tion that flows along “zig-zag paths” of critical edges.® This gives rise to the 
idea to incorporate such zig-zag paths directly into the data flow analysis by 
introducing (virtual) shortcut edges between the origin and the destination 
of such paths. This is illustrated in Figure 5.4(a) which shows a fragment of 
a program that contains a nest of critical edges being emphasised by thick 
lines.® Figure 5.4(b) shows the set of zig-zag successors of node 1 and the 
associated set of (virtual) shortcut edges. 



Fig. 5.4. (a) Program fragment with a nest of critical edges (b) Zig-zag successors 
and virtual shortcut edges of node 1 

Formally, zig-zag predecessors zpred{n) and zig-zag successors zsucc{n) of a 
node n G N are deflned as the smallest set of nodes satisfying 

1. a) pred(ji) C zpred{n) 

b) Vm S zpred{n). pred{succ{m)) C zpred{n) 

2. a) succ{n) C zsucc{n) 

b) Vm S zsucc{n). succ{pred{m)) C zsucc{n) 

Although the above characterisation of zig-zag predecessors perfectly fits for 
the treatment of a bidirectional equation system like the one of delayability 
in Table 5.2, the notion of zig-zag successors cannot immediately be used for 
the transformation of the bidirectional equation system for safety in Table 
5.1, since this equation system incorporates also information on up-safety. 
However, we can easily overcome this drawback by parameterising the defini- 
tions of zig-zag predecessors and zig-zag successors by a set of nodes MCA 
which models program points that terminate the construction of the zig-zag 
sets. Hence the parameterised notions of zig-zag predecessors zpredj^j(n) and 

Note that for instance in [Dha88, Dha91, DK93] equations requiring minimal and 
maximal fixpoint solutions are mixed in a way such that subtle difficulties in the 
treatment of loops show up {hoisting through the loop effect) which prevent these 
algorithms from reaching lifetime optimality. 

® Path means here an underlying undirected path, i. e. one where the orientation 
of edges is ignored. 

® Such patterns are expected to be rare in real life programs. 
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zig-zag successors zsuccM{n) with respect to a node n and a set of nodes 
M C N, respectively, are defined as the smallest set of nodes satisfying 

1. a) pred{n) C zpred^in) 

b) Vm G zpredf^{n). pred{succ{m) \ M) C zpredj^{n) 

2. a) succ{n) C zsuccM{n) 

b) Vm G zsuccM{n). succ{pred{m) \ M) C zsuccM{n) 

Based upon this notion the analyses of Table 5.1 and Table 5.2 can be re- 
formulated in a way that the equations have unidirectional character (see 
Table 5.3 and Table 5.4), however, strictly speaking on the flow graph that 
is enlarged by (virtual) shortcut edges. Nonetheless, we will see in Section 
5.2.3 that this procedure significantly improves on the bit-vector complexity 
compared to the counterpart based on bidirectional bit- vector analyses. In 
particular, this algorithm renders the first estimation that isolates the ex- 
tra costs imposed by critical edges by means of a factor that is a structural 
property of the flow graph under consideration. 



5.1.5 The Complexity 

The well-known fact that bidirectional analyses in expression motion are 
computationally more complex than unidirectional analyses [Dha88, DK93, 
KD94, DP93] does not show up unless structural properties of the flow graph 
come into play due to the consideration of bit- vector steps. As a consequence, 
in the single-expression view critical edges do not add to the complexity, a 
fact that has first been noticed by Khedker and Dhamdhere [KD94] and 
more recently by Masticola et al. [MMR95]. Essentially, this is because the 
fixed point algorithms for bidirectional equation systems are only slightly 
more complicated than their unidirectional counterparts which is reflected in 
a more general update mechanism for the current node that is chosen from 
the workset: here predecessors as well as successors have to be updated and 
possibly added to the workset. For instance. Algorithm 5.1.1 in Table 5.5 
shows the fixed point computation for down-safety as defined by the bidirec- 
tional equation system of Table 5.2(lb) which only differs from Algorithm 
3.6.1 by the extended environment for updates of the workset. In fact, as the 
counterpart to Theorem 3.6.1 we have: 

Theorem 5.1.3 (Complexity of CBEM,^ and CLEM,^). CBEM,^ and CLEM,^ 
can both be performed with run-time complexity of order 0{\G\). 



^ The auxiliary predicate is used to witness the XS AFE -values of the predecessor 
nodes: It is set to true if and only if all predecessors satisfy XDNSAFE . 
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1. Safety Analyses: 

a) Up-Safety Analysis See Figure 3.1 

b) Homogeneous Safety Analysis (with respect to Up-Safety) 
Set XUS = {m G V I XUpSafe^}. 



NDNSAFE„ = 


Comp„ -\- Transp^ ■ XDNSAFE„ 




1 


( false if n = e 


XSAFE„ = 


+ 1 


1 NDNSAFEm otherwise 

m^zsuccyi\]s(n) 




Greatest fixed point solution: Hom-NDnSafe and Hom-XDnSafe 



2. Computation of Homogeneous Earliestness: (No dataflow analysis!) 



Hom-NEarliesU "= 


Hom-NSafe^ ■ 


1 


( false if n = s 




1 

1 n ^UpSafe^ Hom-XDnSafe^ otherwise 




m^pred(n) 


dof 

Hom-XEarliestn = 


Hom-XSafe^ ■ Transp^ 



3. Insertion and Replacement Points of the CBEK^p-Transformation: 



CBEVlip-NInsert^ 


def 


Horn- NEarliest„ 


CBEKip-XInsert^ 


def 


Hom-XEarliesU 


CBEM,^- Replace^ 


def 


Comp„ 





Table 5.3. Computing CBEM^^: The unidirectional variant 
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1. Perform steps 1) and 2) of Table 3.1. 

2. Delayability Analysis: 

NDELAYED„ = Horn- N Earliest.^ + 

{ false if n — s 

XDELAYED^ otherwise 

mGzpred(n) 

XDELAYED„ = Hom-XEarliest„ + NDELAYED„ • Comp^ 
-N-+ Greatest fixed point solution: Hom-NDelayed and Hom-XDelayed 

3. Computation of Latestness: (No dataflow analysis!) 

Hom-NLatest„ Hom-NDelayed^ ■ Comp^ 

Hom-XLatest^ Hom-XDelayed^ ■ Hom-NDelayed^ 

m^succ{n) 

4 . Insertion and Replacement Points of the CLEM^p-Transformation: 

CLEKy,-NInsert^ Hom-NLatest„ 

CLEH^p-XInsert^ "= Hom-XLatest„ 

dof 

CLEEp- Replace^ = Comp^ 



Table 5.4. Computing CLEM^,: The unidirectional variant 
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Algorithm 5.1.1. 

Input: Annotation of N with predicates NDNSAFE and XDNSAFE and an 
auxiliary predicate PRED-XSAFE being initialized as follows:^ 



NSAFE„ = PRED-XSAFE„ = true 

X^AFF ^ false if n e 

" I true otherwise 

Output: Maximal solution to the Equation System of Table 5.1(lb). 

workset := A; 
while workset 7^ 0 do 
let n e workset ; 
workset := workset \ {n} ; 

if ^XSAFEi) 

then { Update Successor Nodes } 
forall m € succ{n) do 

if PRED-XSAFE^ 
then 

PRED-XSAFE^ := false-, 
workset := workset U {m} 
fi 
od 

f i ; 

if NSAEE„ 

then { Local Semantics } 

NSAFE„ := NUpSafe^ + Comp„ + XSAFE„ • Transp„ ; 

fi; 

if ^NSAFE„ A PRED-XSAFE„ 
then { Update Predecessor Nodes } 
forall m £ pred(n) do 

if XSAFE^ A -nXUpSafe^ 
then 

XSAFE^ := false-, 
workset := workset U {m} 
fi 
od 
fi 
od 



Table 5.5. Bidirectional computation of safety 
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5.2 The Multiple-Expression View 

In this section we focus on important aspects that are touched when com- 
bining the components of Section 5.1 to a transformation that deals with 
multiple expressions of a flat or structured universe of expressions. It will 
turn out that computational optimality is not influenced by such a combi- 
nation. In contrast, we have to give up on a reasonable notion of lifetime 
optimality, since the presence of critical edges is a serious source for con- 
flicts when globalising the local information on profitable lifetime trade-offs. 
In fact, this disappointing result is new and gives stronger evidence for the 
importance of splitting critical edges than the extensively discussed problems 
arising from slow convergence of bidirectional data flow analyses which are 
even diminished in the light of the new unidirectional alternatives presented 
in Table 5.3 and Table 5.4. 



5.2.1 Flat Expression Motion 

As in Chapter 4 it is trivial to extend the single-expression algorithms of the 
previous section towards algorithms that cope with all expressions of a flat 
universe of expressions simultaneously. 

5.2.2 Structured Expression Motion 

5. 2. 2.1 Computational Optimal Expression Motion. Like under the 
situation for flow graphs without critical edges busy and lazy expression 
motion can easily be shown to be computationally optimal. This is due to the 
fact that the essential structural properties on safety and delayability carry 
over to the homogeneous versions of the predicates. Hence complementary to 
Lemma 4.2.2 and Lemma 4.2.5 we have: 

Lemma 5.2.1 (Structured Homogeneous Safety Lemma). 

V^/> G G SubExpr^{tjj)j n € N. Hom-DnSafe^ Hom-DnSafe'f 

Lemma 5.2.2 (Structured Homogeneous Delayability Lemma). 

Let Ip € SubExpr^{ip) and h £ N. Then 

1. El om- Delayed f Hom-Delayed'f V CLEM,^ - Correct^ 

2. Horn-Delayed^ A Hom-DnSafef Horn-Delayed^ 

3. Hom-Delayedf A -^Hom-Earliest^ ^ Horn- Earliest!^ 

Then along the lines of Theorem 4.2.1 and Theorem 4.2.2 we may show the 
following result. 
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Theorem 5.2.1 (Homogeneous BEM and LEM Theorem). 

Both CBEM<|> and CLEM<|> are 

1. admissible, i. e. CBEM^, CLEM^ G A£A4<p 

2. profitable, i. e. CBEM^, CLEM^ G VA£Ai,p 

3. computationally optimal, i. e. CBEM^, CLEM^> G 

At this point, however, it should be mentioned that the component transfor- 
mations have to be chosen carefully in order to yield a structured expression 
motion at all. Drechsler and Stadel [DS88], for instance, observed that the 
original proposal of Morel and Renvoise [MR79] is not suitable as a basis 
for an algorithm that operates on structured sets of expressions, since their 
equation system allows pathological cases where subexpressions cannot be 
hoisted in front of their superexpressions. 

5. 2. 2. 2 Structured Lifetimes iu the Preseuce of Critical Edges. Un- 
fortunately, the most serious drawback of critical edges comes into play when 
lifetimes of temporaries are taken into account. In fact, as opposed to the 
situation in Chapter 4 in general neither lifetime optimal nor inductively life- 
time optimal results exist anymore. This is illustrated by means of Figure 5.5. 
The point of this example is that the subexpressions pi and ip 2 are forced to 
be initialised on the path (1, 3,5,7,... ), but not on the path (2, 4, 5, 7, ... ). 




Fig. 5.5. Problems with trade-offs of life- 
times and critical edges 



Therefore, a trade-off between the lifetime ranges of the temporary associated 
with P 1 + P 2 and the temporaries associated with ipi and (p 2 is only profitable 
on the former path. However, the critical edge (4,5) is responsible that the 
situation on both paths cannot properly be decoupled. As a consequence 
there are two alternatives for the initialisation of the temporary associated 
with (pi + ip 2 that are not comparable in terms of the lifetime better order 
as introduced in Definition 4.2.3. Figure 5.6(a) shows an early initialisation 
of ha at node 3 that reduces the number of lifetime ranges at the exit of this 
node to one. However, this also forces an initialisation of ha at node 4, which 
increases the number of lifetime ranges at the exit of this node to three.® 



Note that the values of the subexpressions have to be kept for the initialisation 
of I14 which cannot be done before the entry of node 6. 
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On the other hand, Figure 5.6(b) shows the situation if hs is initialised at 
node 5. Though in this case the profitable trade-off between lifetime ranges 
at node 3 is missed, this solution requires exactly two lifetime ranges at the 
exits of node 3 and 4, respectively. Hence the result is actually incomparable 
to the transformation of Figure 5.6(a). 




Fig. 5.6. Incomparable lifetime minimal expression motions 



Finally, this example provides the opportunity to elucidate the advantage of 
edge splitting by investigating the situation, where in the example of Figure 
5.5 the critical edge would be split by inserting a synthetic node. Then the 
lifetime optimal solution of Section 4.5 could exploit its full power by using 
profitable lifetime trade-offs as early as possible. This is shown in Figure 5.7. 
In fact, since conflicts are now completely resolved the number of lifetime 
ranges being active at the exits of node 3 and 4 is one and two, respectively, 
which strictly improves upon both results of Figure 5.6. 

5.2.3 The Complexity 

Due to the difficulties arising with lifetime considerations we only consider 
the multiple-expression versions of CBEM,^ and CLEM<|,. As in Chapter 4 there 
is no difference in the complexity between the flat and structured versions. 
Obviously, from a pure point of view® a consequence of Theorem 5.1.3 is: 

Theorem 5.2.2 (Complexity of CBEM<|, and CLEM^,). CBEM^, and CLEM^, 

can he performed with run-time complexity of order C(|G| |^|) for both a flat 
or a structured universe of expressions. 



® That means, without an assumption of elementary bit-vector operations. 
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Fig. 5.7. Lifetime optimal expression mo- 
tion after splitting of the critical edge 



More interesting, however, is the question of the bit-vector complexities of 
the algorithms under consideration. Here, the well-known deficiency of the 
usage of bidirectional analyses shows up. 

5. 2. 3.1 The Bidirectional Variants. In contrast to their unidirectional 
counterparts, which are reported to stabilize within a number of bit-vector 
steps that is almost linear in the size of the program (cp. Theorem 4.6.4) no 
corresponding result is available for bidirectional analyses. Recently, Dhamd- 
here and Khedker [DK93, KD94] came up with an estimation on bidirectonal 
data flow analyses given in terms of the width of a flow graph. 

Theorem 5.2.3 (Complexity of bidirectional CBEM^, and CLEM,^). 

The (partly) bidirectional equation systems for CBEM^, and CLEM^, of Table 5.1 
and Table 5.2 can be solved with w-hl round robin iterations, where the width 
w denotes the number of non-conform edge traversals on an information flow 
path.^° 

Note, however, the width of a flow graph is not a structural property of a 
flow graph, but varies with the bit-vector problem under consideration. In 
particular, it is larger for bidirectional problems than for unidirectional ones, 
and in the worst case it is linear in the size of the flow graph. Even worse, 
for the bidirectional equation systems used in expression motion, like ours 
in Table 5.1(lb) and Table 5.2(2), the width indeed usually grows linearly 
with the program size. This is even true for acyclic programs. We shall dis- 
cuss this behaviour by means of our forwards directed bidirectional equation 

Informatively, an information flow path is a sequence of backwards or forwards 
directed edges along which a change of information can be propagated. A forward 
traversal along a forward edge or a backward traversal along a backward edge are 
conform with a round robin method proceeding (forwards) in reverse postorder. 
The other two kind of traversals are non-conform. Complemental notions apply 
to round robin iterations proceeding in postorder. 
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system for delayability (cp. Table 5.2(2)). To this end we are considering the 
program fragment sketched in Figure 5.8. This program fragment is an infor- 
mation flow path for delayability where the non-conform edge traversals are 
emphasized by a grey circle. 




Fig. 5.8. Width of a flow graph: information flow path with 
non-conform edge traversals 



Hence the width of a flow graph with such a fragment linearly depends on the 
“length” of the fragment. It should be noted that such programs are by no 
means pathological and thus the linear growth of the width is not unlikely for 
real life programs. In fact, the large width in this case is actually reflected in a 
poor behaviour of a round robin iteration schedule. Figure 5.9(a) shows how 
delayability information slowly propagates along this “path” being stopped 
in each iteration at a non-conform (critical) edge.^^ 

5. 2. 3. 2 The Unidirectional Variants. The drawbacks of bidirectional 
bit-vector analyses being presented in the previous section can be avoided 
by using the unidirectional variants of Table 5.3 and Table 5.4. Considering 
again the path of Figure 5.8 the unidirectional variant can take advantage 
of the shortcut edges yielding stabilization after the very first iteration as 
displayed in Figure 5.9(b). In general, we have the following result. 

Theorem 5.2.4 (Complexity of unidirectional CBEM,^ and CLEM^). 

The unidirectional variants o/ CBEM^ and CLEM^ of Table 5.3 and Table 5-4 
can be solved with d' -|- 1 round robin iterations, where d' denotes the depth 
(cp. Section 4-6.4) of the flow graph after adding virtual shortcut edges. 

As mentioned, for real life programs we do not expect pathological nests of 
critical edges as in Figure 5.4. For this reason both the additional setup-costs 

Shaded circles indicate the flow of informations along the “path”. Actually, this 
means the propagation of the Boolean value “false”, as the equation system in 
Table 5.2(2) computes the largest solution. 
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Iteration 4 



Fig. 5.9. (a) Slow convergence due to bidirectional data flow analysis (b) Fast 
convergence with unidirectional data flow analysis using shortcut edges 



for determining zig-zag predecessors and zig-zag successors at a node as well 
as the growth of the parameter d' in comparison to d can be expected to 
be bound by a small constant. However, in [Riit98a] we further investigate 
a hybrid iteration strategy that does not actually add shortcut edges but 
rather intermixes a round robin schedule on the original flow graph with 
zig-zag propagation of information along critical edges. 



Overview 



Assignment Motion 

Assignment motion is a technique that complements expression motion as 
presented in the first part of this monograph by incorporating the movement 
of left-hand side variables of assignments as well. Although such an extension 
may seem straightforward at first glance, the movement of left-hand sides of 
assignments has serious consequences that are subject of this part. In contrast 
to expression motion which has thoroughly been studied in program optimi- 
sation [Cho83, Dha83, Dha88, Dha89b, Dha91, DS88, DS93, JD82a, JD82b, 
Mor84, MR79, MR81, Sor89] the work on assignment motion based algo- 
rithms is quite limited. Rosen, Wegman and Zadeck [RWZ88] mention IBM’s 
PL. 8-compiler [AH82] for an iterated application of partial redundancy elim- 
ination. However, they neither give details on this extensions nor a first-hand 
reference. The only explicit description of an extension of partial redundancy 
elimination towards assignments is given by Dhamdhere [Dha91]. However, 
his proposal does not recognise the full potential that lies in such an exten- 
sion. This part of the book provides a systematic approach to the phenomena 
associated with program transformations based upon assignment motion. 

Second Order Effects 

The most striking phenomenon of assignment motion based transformations 
are their second order effects [RWZ88, DRZ92]: one transformation may pro- 
vide opportunities for others. As a consequence, assignment motion based 
transformations usually do not stabilise after their first application. However, 
the impact of second order effects is twofold: on one side, the optimisation po- 
tential increases significantly, as even few first order opportunities may cause 
numerous second order opportunities. On the other hand, we are faced with 
the problem that exhausting the optimisation potential completely, requires 
to iterate the component transformations involved. This gives raises issues on 
the confluence and complexity of the process, which will be of major concern 
in this part. 

Assignment Hoisting and Assignment Sinking 

For expression motion the direction of code movement is determined by defi- 
nition, since initialisation sites have to precede their corresponding use sites. 
Hence expression motion stands as a synonym for expression hoisting. In con- 
trast, assignment motion is not restricted in this way. Both alternatives, the 
hoisting and sinking of assignments, are reasonable directions of code move- 
ment. This symmetry was inspiration to develop an algorithm for partial dead 
code elimination [KRS94b] which complements partial redundancy elimina- 
tion. While partial redundancy elimination hoists expressions to places where 
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they become redundant with respect to others, partial dead code elimination 
rests on the idea to sink assignments to places where they become entirely 
dead. Alternatively, this transformation can even be further strengthened by 
eliminating faint assignments [HDT87, GMW81] rather then dead ones only. 

On the other hand, assignment hoisting can be employed equally well in 
order to eliminate partially redundant assignments. More interesting, how- 
ever, is the function of assignment hoisting as a catalyst for enhancing the 
potential of expression motion, which we demonstrated in an algorithm for 
the uniform elimination of partially redundant expressions and assignments 
[KRS95] . Here this approach will be further improved adapting the techniques 
for minimising lifetime range of Chapter 4. 



A Uniform Framework 

Common to assignment motion based program transformations is the fact 
that they combine a set of admissible assignment motions with a set of cor- 
responding eliminations. Formalising this situation, we present a uniform 
framework for assignment motion based program transformations that is ap- 
plicable in all practically relevant situations. In this setting we provide simple 
criteria that grant confluence and fast convergence of the exhaustive appli- 
cation of elementary transformations. 



Structure of the Part 

In Chapter 6 we give an overview on the most relevant applications in the 
field of assignment motion. In particular, we address the main applications 
of assignment motion: partial dead (faint) code elimination and the uniform 
elimination of assignments and expressions. Afterwards, Chapter 7 presents 
our general framework for assignment motion based transformations. In sym- 
metry to the first part, the final Chapter 8 is again devoted to issues caused 
by the presence of critical edges. 



Conventions 

As in the first part of the book we mainly consider flow graphs whose nodes 
are elementary statements. However, sometimes we switch to the basic block 
representation. This is for the reason that the basic blocks of a program are 
structurally invariant during a sequence of transformation steps, whereas the 
positions of elementary statements may significantly change. Throughout this 
part we assume a fixed flow graph G which is assumed to be element of 
in Chapter 6 and Chapter 7 and to be element of S'®crit in Chapter 8. 




6. Program Transformations Based on 
Assignment Motion 



After introducing the central notion of the second part, assignment mo- 
tion, we sketch the main applications for assignment motion based pro- 
gram transformations: the elimination of partially dead (faint) assignments 
(cf. [KRS94b]), the elimination of partially redundant assignments and the 
uniform elimination of partially redundant assignments and expressions (cf. 
[KRS95]). It should be noted that the reasoning on properties of these trans- 
formations is postponed to Section 7, where a uniform framework for assign- 
ment motion based program transformations is presented. 



6.1 Assignment Motion 

In this section we first introduce the notion of assignment motion as a coun- 
terpart to expression motion (cf. Definition 3. 1. 1). In contradistinction to 
expression motion, however, (admissible) assignment motion only aims at 
the pure movement and does not permit any program improvement on its 
own. Moreover, assignment motion becomes important for both the forward- 
and backward-direction. 

Definition 6.1.1 (Assignment Motion). Fora G AP(G) we define: 

1 . an assignment sinking with respect to a is a program transformation that 

a) eliminates some original occurrences of a and 

b) inserts some instances of a at program points following an elimina- 
tion point} 

2. an assignment hoisting with respect to a is a program transformation 
that 

a) eliminates some original occurrences of a and 

b) inserts some instances of a at program points preceding an elimina- 
tion point} 

The fact that G' results from G by an assignment sinking with respect to 
a G AV{G) shall be expressed by the notion G G' . Accordingly, for 

^ That means, program points which follow (precede) the elimination point on 
some program path. Note that this condition is quite liberal, but will be further 
constrained when considering admissible code motion (cf. Definition 6.1.2). 
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assignment hoistings the notion G G' is used. For convenience, for 

a given assignment motion like AS^ = G G' a function-like notation 

ASq(G) is used in order to refer to the program 

An assignment motion can completely be characterised by means of its inser- 
tion points and its elimination points. For instance, for an assignment sinking 
ASq and ft G fV we have: 

kSa-Insert^: an instance of a is inserted at ft. 

kSa~ Remove^: an original occurrence of a is removed from fi.^ 

Local Predicates ^ 

For every node n G N that and every assignment pattern x := ip G A'P{G) a 
number of local predicates is defined. To this end, let us further assume that n 
is associated with an instruction pattern /3„ with right-hand side expression 
and left-hand side variable 

LhsMod^: n modifies the left-hand side variable of a, i.e. = x 

RhsMod^: n modifies the right-hand side expression of a, i.e. 

G SubExpr* (p) 

AssMod^: n modifies an operand of a, i. e. LhsMod^ V RhsMod^ 

LhsUsed^: n uses the left-hand side variable of a, i.e. x G SubExpr* {(3^^) 

AssOcc^: the assignmnent pattern a occurs at n, i. e. (3n = ck 

Blocked^: n blocks the movement of a, since the ordering of both instruc- 
tions must not be changed, i.e. AssMod^ V LhsUsed^ 

All local predicates are naturally extended to program points: entry proper- 
ties are directly inherited from the properties of the associated nodes, while 
exit properties are uniquely set to false. 

6.1.1 Admissibility 

As for expression motion we have to impose additional constraints for assign- 
ment motion in order to grant that the semantics of the argument program 

^ This is in accordance with the view of ASa as a function from 5® to 5® defined 
by: 

VG"6 3e.AS.(G")S{ g. 

® More precisely, h is the entry point of a node n where the occurrence is removed. 
^ In the case that the right-hand side expression does not exist, e.g., if (3 = skip, 
or that there is no left-hand side variable, e.g., if /3 = out {. . . ), we assume that 
the special value T is assigned to and (3^^, respectively. 
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is preserved. However, we cannot naively adopt the same constraints as in 
expression motion. The requirement that no new expression patterns are 
inserted on a program path, for instance, would be far to liberal when gener- 
alised to assignments, since the insertion of an assignment like x := a; -I- 1 is 
usually not sound, even on program paths containing an instance of this as- 
signment pattern. Hence a reasonable notion of admissibility has to be much 
more restrictive than the one for expression motion: we require that the or- 
dering among blocking instructions is entirely preserved. In other words, code 
motion must not violate a data dependency or anti-dependency in the original 
program. 

Definition 6.1.2 (Admissible Assignment Motion). 

J. An assignment sinking ASq = G G' is admissible iff: 

a) the removed original assignment patterns of a are substituted, i. e. 
for each h G N (see Figure 6.1 for illustration) 

ASa-Removef) => kSa-Substf) , 

where kSa-Substf ^ Vp G P[fi,e] 31 ^ i ^ |p|. kSa-Insertp. A 

y 1 < j < i. ^Blocked)) 

b) the inserted instances of a are justified, i. e. for each h G N (see 
Figure 6.2 for illustration) 

kSa-Insertf) ^ kSa-Justf), 

where kSa-Justf) ^ Vp G P[s, h] 3 1 ^ i < |p|. kSa-RemovCp. A 

y i < j < |p|. ^Blockedp. 

2. The admissibility of an assignment hoisting is defined analogously. 




Fig. 6.1. Substitution of removed Fig. 6.2. Justification of inserted 

assignments assignments 



In essence, admissible assignment motions are pure repositionings of code 
that do not permit any program improvements on their own. Actually, on 
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every program path from s to e there is a one-to-one correspondence be- 
tween the replaced original occurrences and the inserted ones: on the one 
hand, there is a unique insertion belonging to each replaced original occur- 
rence, which is forced by the usage of the 3-quantifier in the definition of the 
predicate kSa-Subst, on the other hand the complemental predicate kSa-Just 
also ensures that each insertion is at most justified through a unique removal 
on this path.® The set of all admissible assignment sinkings and the set of all 
admissible assignment hoistings with respect to a is denoted by AASa and 
AATLa, respectively. 



6.2 Partial Dead Code Elimination 

As an application of assignment sinking we present the elimination of par- 
tially dead assignments, or for short partial dead code elimination, as intro- 
duced in [KRS94b]. 



6.2.1 Motivation 

Dead code elimination [Hec77, W.78, Kou77] is a technique for improving 
the efficiency of a program by avoiding unnecessary assignments to variables 
at run-time. Usually, an assignment is considered unnecessary, if it is totally 
dead, i. e. if the content of its left-hand side variable is not used in the re- 
mainder of the program. Partially dead assignments as the one in node 1 of 
Figure 6.3(a), which is dead on the left branch but alive on the right one, are 
out of the scope of (classical) dead code elimination. 

In [KRS94b] we presented a technique for partial dead code elimination 
where assignments are sunk to places at which they become entirely dead 
and can be eliminated. The basic idea is illustrated in Figure 6.3. The point 
here is that the assignment to x at node 1 is only used on the right branch, 
since x is redefined at node 2. After sinking the assignment to x from node 
1 to node 3 and to the entry of node 2, it becomes dead in the latter case 
and can be eliminated. Hence the point of partial dead code elimination is 
to sink partially dead assignments as far as possible in the direction of the 
control flow, while maintaining the program semantics. This way, statements 
are placed in an as specific context as possible, which maximises the potential 
of dead code elimination. Therefore, our approach in [KRS94b] is essentially 
dual to partial redundancy elimination as considered in the first part of this 
book, where expressions are hoisted against the control flow as far as possible, 
in order to make their effects as universal as possible. It should be noted 
that our approach was the first one that observed this duality. However, 

® Note that in the case of two preceding removals the latter one wonld be a blockade 
for the former one. 
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Fig. 6.3. Partial dead code 
elimination 



independently from our algorithm two other approaches came up addressing 
the elimination of partially dead code. 

Feigen et al. [FKCX94] developed an algorithm that builds more complex 
entities for movement whenever the elementary statements are blocked. In 
contrast to the usual code motion algorithms this particularly may modify 
the branching structure of the program under consideration. However, their 
algorithm is not capable of moving statements out of loops or even across 
loops. Furthermore, movements are restricted to one-to-one movements, i. e. 
code fragments are removed at one place in order to be implanted exactly at 
one other place, where the fragment is alive. Later Bodik and Gupta [BG97] 
developed a more systematic structure-modifying technique for partial dead 
code elimination based on the predication of instructions which, however, is 
of exponential run-time complexity when applied exhaustively. 

Finally, Briggs and Gooper proposed an algorithm [BG94] that employs 
instruction sinking for the reassociation of expressions. As a by-product some 
partially dead assignments can be removed. However, in contrast to our al- 
gorithm their strategy of instruction sinking can significantly impair certain 
program executions, since instructions can be moved into loops in a way 
which cannot be “repaired” by a subsequent partial redundancy elimination. 



6.2.2 Defining Partial Dead Code Elimination 

Formally, partial dead code elimination ( VDCS) stands for any sequence of 
elementary transformations of the following type: 

— admissible assignment sinkings and 

— eliminations of dead assignments. 

Whereas admissible assignment sinkings are already introduced (cf. Definition 
6. 1.2(1)), the elimination transformations need to be defined. 

Definition 6.2.1 (Dead Assignment Elimination). 

1. An occurrence of an assignment pattern a= x := ip is dead at n G N , 
if on every path from n to e a usage of x is is always preceded by a 
redefinition of x. Formally, this is expressed by the following predicate: 
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Vp G P[n,e], 1 < i ^ |p|. LhsUsedp. 31 < j < i. LhsModp^ 

2. A dead assignment elimination with respect to a G AV{G) is a program 
transformation that eliminates some dead original occurrences of a in the 
argument program. 

The set of all dead assignment eliminations with respect to a is denoted by 
T>ASa- As for assignment motions (cf. Definition 6.1.1) we use the notion 
G G' to express that G' results from G by applying a dead assignment 
elimination with respect to a. Moreover, for a given dead assignment elimi- 
nation DAEq = G G' the term DAE„(G) refers to the program G' that 
results from the particular transformation. 

6.2.3 Second Order Effects in Partial Dead Code Elimination 

As already mentioned, the effect of partial dead code elimination is based 
on the mutual interactions of assignment sinkings and dead assignment elim- 
inations. Each of these transformation types may expose opportunities for 
the other one. Such effects are commonly known as second order effects 
[RWZ88, DRZ92] in program optimisation. Since second order effects are 
of major importance for assignment motion based program transformations, 
we will use VDC£ as a representative for a systematic classification of all 
possible effects. 

Motion-Elimination Effects These are the effects of primary interest: an 
assignment is moved until it can be eliminated by a dead assignment elimi- 
nation. Reconsider the motivating example of Figure 6.3 for illustration. 

Motion-Motion Effects These are effects where the movement of one as- 
signment may provide opportunities for other assignments to move. This can 
happen for various reasons, e.g., if the assignment that is moved first is a 
use site or redefinition site for the other one, or if it modifies an operand of 
the other’s right-hand side expression. The latter case is illustrated in Figure 
6.4(a). 

Without the previous movement of the assignment at node 2 the assign- 
ment at node 1 can move at most to the entry of node 2, since a further 
movement would deliver a different value for the right-hand side computa- 
tion. However, if we anticipate the movement of the assignment at node 2 
then it can be sunk to the entry of node 5. Thereafter, the assignment at 
node 1 can be moved to node 3 and 4. In the case of node 4 it becomes 
entirely dead and can finally be eliminated. 

Elimination-Motion Effects Such effects show up when the elimination of 
a dead assignment enables the movement of another assignment as depicted 
in Figure 6.5(a). Here, the assignment a := ... at node 1 cannot be moved 
to the entry of node 3 without violating the admissibility of this movement. 
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Motion-motion effect 



But it can be removed by a dead assignment elimination, since its value is of 
no further use. Finally, this removal enables the assignment y := a -I- 6 to be 
sunk to node 4 and 5 in order to eliminate further partially dead assignments 
leading to the resulting program displayed in Figure 6.5(b). 




Elimination-Elimination Effects Such an effect is depicted in Figure 
6.6(a). Here, the assignment at node 4 is dead and can be eliminated, since 
the left-hand side variable y is redefined before it is used on every path lead- 
ing to the end node. However, the assignment to a at node 1 , which was 
not dead before due to its usage at node 4 , now becomes dead and can be 
removed as shown in Figure 6.6(b). 

The Power of Resolving Second Order Effects We close this section on 
second order effects with a more complex example that is suitable to illustrate 
the enormous potential that lies in the exhaustive resolution of second order 
effects. 
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Fig. 6.6. Elimination- 
elimination effect 



The most striking inefficiency in the example of Figure 6.7 is the “loop invari- 
ant” code fragment in node 2. Note, however, that the occurrence of a -I- 6 
inside of the loop is out of the scope of expression motion, since the first 
instruction defines an operand of the second one. However, V'DCS can per- 
form the optimisation displayed in Figure 6.8, where no further assignment 
sinking or dead assignment elimination is applicable.® 

After removing the second assignment from the loop,^ the first assignment 
turns out to be loop invariant, too, and can be removed from the loop as well. 

® Synthetic nodes are inserted on demand. 

^ Note that this requires two steps. First the assignment x \= a -|- & is moved 
from node 2 to node 3 and 4. Afterwards the dead occurrence at node 3 can be 
eliminated. 



5 



7 




Fig. 6.7. A complex example illustrating the 
power lying in the exhaustive exploitation of 
second order effects 
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Fig. 6.8. Exhaustive 'P'DCE applied to Figure 
6.7 




Thereafter, these two assignments together with the blockade at node 4 can 
be subjected to further movements and eliminations. Note that the synthetic 
node S6,8 is finally the only point where the execution of the loop invariant 
fragment is actually required. In fact, we can see that T’TfCE supplies the 
output instructions at node 7 and 8 exactly with the sequences of assignments 
that establish the correct output on every path. This significantly reduces 
the number of assignments in some situations. For instance the number of 
assignments occurring on the path ( 1 , 2 , 3 , 2 , 3 , 2 , 4 , 5 , 7 , 9 ) is reduced from 
8 to 1. 



6.3 Partial Faint Code Elimination 

The definition of deadness (cf. Definition 6.2.1) is too narrow in order to char- 
acterise all assignments that are of no use, i. e. that do not contribute to the 
evaluation of an immobile statement, like for instance an output statement. In 
[HDT87, GMW81] deadness is weakened towards a notion of faintness, which 
is given in terms of a recursive definition (cf. Definition 6.3.1). Faintness of 
assignments not only covers all situations that can be detected through the 
iterated elimination of dead assignments, but in addition also captures some 
unnecessary assignments that are out of the scope of iterated dead assignment 
elimination. This is illustrated in Figure 6.9 which is taken from [HDT87]. 
The point here is that the assignment to x inside of the loop cannot be elim- 
inated by classical dead assignment elimination, since the assignment itself 
is a use site of its left-hand variable. 
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Fig. 6.9. A faint but not dead assignment 



Definition 6.3.1 (Faint Assignment Elimination). 

1. An occurrence of an assignment pattern a = x := ip at a node n G N is 
faint, if on every program path leading from n to e a usage of x 

— is either preceded by a redefinition of x or 

— defines a faint assignment as well. 

Formally, the property is captured by means of the greatest solution of a 
recursive predicate:^ 

Faint^ ^ Vp € P[n,e], 1 < z ^ |p|. LhsUsedp. 

(3 1 < j < z. LhsModp.) V Faintp., 

where j3 is an assignment pattern that is associated with node pi . 

2. A faint assignment elimination with respect to a is a program transforma- 
tion that eliminates some faint original occurrences of a in the argument 
program. 

The notions G G" and FAEq(G) for FAE„ = G G' are introduced 
in analogy to the counterparts for dead assignment elimination. The set of 
all faint assignment eliminations with respect to a is denoted as iFASa. 

Hence formally partial faint code elimination ( stands for any se- 
quence of: 

— admissible assignment sinkings and 

— eliminations of faint assignments. 



6.4 Partially Redundant Assignment Elimination 

Complementary to V'DCS also assignment hoisting can be employed prof- 
itably in a straightforward way by introducing the elimination of partially 
redundant assignments {'PFLA£) as a technique that combines: 



Note that the least solution exactly characterises those assignments that can be 
eliminated through iterated dead code elimination. 
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— admissible assignment hoistings and 

— redundant assignment eliminations 

where the latter transformations are defined as below: 

Definition 6.4.1 (Rednndant Assignment Elimination). 

1. An occurrence of an assignment pattern a = x := ip at a node n 
is redundant, if on every program path leading from s to n there is a 
strictly prior occurrence of a such that no instruction in between both 
occurrences modifies 

a) the left-hand side variable x or 

b) an operand of the right-hand side expression (p.® 

Formally, redundant assignments are captured by means of the following 
predicate: 

Redundantf^ ^ Vp € P[s,n] 3 1 ^ z < \p\. AssOcc^. A 

y i < j < \p\. ^LhsModp. A Vz < j ^ \p\. ^RhsModp. 

2. A redundant assignment elimination with respect to a is a program trans- 
formation that eliminates some redundant original occurrences of ex. in the 
argument program. 

Once more, notions G G' and RAEq(G) for RAE^ = G G' are 

introduced in analogy to dead and faint assignment eliminations. The set of 
all redundant assignment eliminations with respect to a is denoted by TZASa ■ 



6.5 The Uniform Elimination of Partially Redundant 
Assignments and Expressions 

Unfortunately, in practice the optimisation potential of is quite lim- 

ited, since redundant assignment elimination is based on complete assign- 
ment patterns. In contrast, the elimination of redundant expressions only 
matches right-hand side expressions and dead code elimination even only fo- 
cuses on the left-hand side variables of assignments. However, the importance 
of V'R.AE lies in its catalytic function with regard to expression motion. This 
effect was first discovered by Dhamdhere [Dha91], whose solution, however, 
wastes much of the potential of such a combination, which is caused by an 
inappropriate mixture of optimisations goals. 

® To exclude recursive assignment patterns like x := x -\-l from being redundant 
the occurrence at node n itself must not modify an operand of tp. In the formal 
definition this is reflected by means of slightly different ranges to be excluded 
for left- and right-hand side modifications. 

Precisely, the proposal incorporates lifetime considerations of temporaries too 
early. A more detailed discussion on this aspect can be found in Section 7.4.3. 
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Principally, V'R.AE shows the same mutual dependencies between elimina- 
tions and assignment motions as illustrated for VDCS. In addition, there are 
also strong second order effects between the elimination of partially redun- 
dant expressions^^ {T’^R-SS) and 'P'R.A£, which are listed in the following: 

'P'R-SS-'P'R-AE- Effects In Figure 6.10(a) the assignment to a at node 5 
cannot be hoisted out of the loop, since it is blocked by the assignments at 
node 3 and 4 that cannot be subjected to a successful PPAS. However, 
using PP££ the expression a + h can be hoisted to node 1 and to (the 
synthetic node inserted on) the back edge of the loop, which enables PPA£ 
to hoist the assignment a := 1 subsequently as displayed in Figure 6.10(b). 




PPA£-PP££ Effects Reversely, PPA£ can also enhance the potential 
for PP££. To this end we only have to slightly modify Figure 6.10. This 
leads to Figure 6.11, where first the assignment a := 1 at node 2 has to be 
treated with PPA£ before PP££ can be applied profitably. 

In [KRS95] we presented an algorithm for the uniform elimination of partially 
redundant expressions and assignments {UPP£ ) that shares the advantages 
of both PP££ and PPA£, while even enhancing their individual power 
significantly. Formally, UPP£ stands for any sequence of 

— partial redundant expression eliminations (PP££) and 

— partial redundant assignment eliminations {PPA£). 

To give an impression of the power of UPP£ let us consider the example of 
Figure 6.12, which is taken from [KRS95]. 

Figure 6.13 shows the separate results that emanate from an application of 
pure PP££ and pure PPA£. In particular, both results fail to eliminate 
the most significant inefficiency in the program of Figure 6.12, the “loop 
invariant” assignment x := y + z in node 3. For PP££ this is due to the 

'P'R.SE stands as a synonym for expression motion. 
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a) 





fact that the right-hand side expression y + z is blocked by the assignment 
to y in node 3 whose elimination is out of the scope of T’TZSS. On the 
other hand, pure T’TLAE does not succeed either, since x is used in the 
Boolean expression controlling the loop iteration, which cannot be subjected 

to vmA£. 

U'P'R.S performs the optimisation displayed in Figure 6.14. This optimisa- 
tion is achieved by the following steps: T’TZJ^ can resolve the redundant 
assignment y '.= c -I- d in node 3, which is redundant with respect to the 
corresponding assignment of node 1. Moreover, c -I- d can be subjected to 
V'R.SS, which initialises a temporary hi in block 1. Similarly, V'R.SS with 
respect to x + z initialises a temporary h 2 in block 1 and 3. The original 
occurrences of c -I- d in node 1 and 4 are replaced by hi, and the original 
occurrence of a: -I- 2 in node 2 is replaced by h 2 . This finally eliminates the 
blockade of the assignment x := y + z within the loop. Thus, as a second 
order effect, the assignment x := y + z can now be removed from the loop by 
hoisting it (together with the corresponding assignment at node 4) to node 1. 




Fig. 6.12. Illustrating the power of U'P'R-S 
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Note that neither the assignment i := i + x vci node 3 nor the computations 
oi y + i and i + x in nodes 2 and 3 are touched, since they cannot be moved 
profitably. 

However, the previous example does not suggest a general way, how 
'P'R.SS and have to be employed in order to take maximal benefit 

from each other. This problem was solved in [KRS95] where we succeeded in 
developing an algorithm that, at least for the primary goal of partial redun- 
dancy elimination, which is to avoid recomputations of expressions, is solved 
as well as possible. A key point in this algorithm is the fact that 
can completely be simulated by means of using a simple preprocess 

(cf. Lemma 7.4.1). Beside computational optimality, secondary optimisation 
goals are taken into account which are to minimise the number of (trivial) 
assignments and to minimise the lifetimes of the temporaries introduced by 
expression motion. In [KRS95] we showed that these goals cannot be solved 
optimally, since distinct, incomparable solutions may exits. As a heuristics, 
a final flush phase moves assignments forwards in the direction of control 
flow as far as possible. However, this solution suffers from two drawbacks: 
first, like in lazy expression motion interdependencies between distinct life- 
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time ranges are not taken into account. Moreover, lifetimes of temporaries 
that are already introduced in advance for the purpose of decomposing large 
expressions are distinguished from the ones introduced by expression motion. 
To this end, in Chapter 7 we sketch a more ambitious algorithm for 1AT''R.£ 
that uniformly copes with all kinds of temporaries and incorporates trade- 
offs between lifetime ranges by adapting the techniques of Chapter 4, which 
finally results in a lifetime optimal solution for U'P'R.S. 




7. A Framework for Assignment Motion Based 
Program Transformations 



In this chapter we present a general framework for assignment motion based 
program transformations that covers all the applications considered in Chap- 
ter 6: VT>CS, VJ^CS, VTZAE and tlVTZS. Within this framework the 
interleaving of admissible assignment motions and eliminations can be in- 
vestigated in a general setting. Section 7.1 presents the foundation for this 
framework. The instances of the framework, motion-elimination couples, com- 
bine assignment motions with eliminations. As the presence of second order 
effects requires to apply the component transformations repeatedly, this pro- 
cess raises the following questions of primary interest: 

— Is the process confluent, i.e. is the final result independent from the appli- 
cation order? Section 7.2 is devoted to this question. 

— What are the additional costs (compared to the elementary transforma- 
tions) in terms of computational complexity! This question is addressed in 
Section 7.3. 

Section 7.4 finally considers U'P'R.S as a more ambitious application for 
reasoning within the framework. 



7.1 The Setting 

7.1.1 Motion-Elimination Couples 

The program transformations considered in Chapter 6 share that they are 
composed of 

— assignment motions and 

— eliminations 

that mutually influence each other. More formally, we define: 

Definition 7.1.1 (Motion-Elimination Couple). 

A motion-elimination couple (MEC) is a pair {'Tam,T'ei), where 

~ Tam is a set of admissible assignment motions as defined in Definition 

6.1.2, i. e. Tam= U AASa or Tam= u AAHa. 

a^AV{G) a^AV{G) 



O. Ruething: Interacting Code Motion Transformations, LNCS 1539, pp. 153-197, 1998. 
© Springer-Verlag Berlin Heidelberg 1998 
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— Tel is a set of elimination transformations of a type as introduced in Chap- 
ter 6, i. e. Tel C U VA£a U TA£a U TZA£a- 

a^AV(G) 

Note that MECs are mainly parameterised by their eliminations Tei, since 
for assignment motions only the direction of movement is indeterminate. 
The restriction to either sinkings or hoistings ensures that non-trivial assign- 
ment motions can be undone by other movements, which would allow non- 
terminating sequences of assignment motions. Moreover, the usage of equality 
rather than containment ensures that we can find assignment motions that 
progress sufficiently fast. 

7.1.2 A Note on the Correctness of Motion-Elimination Couples 

The elementary transformations of an MEC as they are defined in Chapter 6 
are intended not to modify relevant aspects of the program behaviour. How- 
ever, there are significant differences between them, which the user should 
be aware of. Transformations introduced so far preserve total correctness 
[DGG’'"96] of a program, i.e. a terminating execution in the source program 
has a corresponding terminating execution in the transformed program both 
coinciding in their visible behaviours. On the other hand, all transformations 
apart from partial dead code elimination and partial faint code elimination 
also preserve partial correctness [DGG+96]: each terminating execution in 
the transformed program has a corresponding terminating execution in the 
original program such that the visible behaviours in terms of their output 
sequences coincide.^ T>A£a or TA£a do not preserve partial correctness, as 
the elimination of a dead or faint assignment like x := 1/0 may cause reg- 
ular termination on a program path that did not terminate in the original 
program. However, we can easily overcome this defect, since it is obviously 
caused by the elimination of assignments whose right-hand sides contain par- 
tially defined operators. This should particularly be distinguished from the 
case where partial correctness is violated because a non-terminating loop is 
made terminating. Fortunately, this can be excluded for dead (faint) code 
elimination. Essentially, this is due to the fact that branching instructions 
are classified as immobile statements, which makes it impossible to eliminate 
any assignment on which such an instruction may depend even indirectly. 
Therefore, to accomplish the preservation of partial correctness for VDC£ 
and TTC£, too, we only have to exclude any assignment pattern which con- 
tains partially defined operators as a feasible optimisation candidate. Since 
most of the operators are totally defined on the source level this does not 
diminish these techniques too much. Finally, static analysis techniques for 

^ Note that neither of these notions implies the other one. The transformation 
that maps each program to a non-terminating one preserves partial, though not 
total correctness. The other way round, as explained in the following, T>A£a is 
an example for a transformation that preserves total but not partial correctness. 
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range checking [Har77] could be employed to discover additional cases, where 
assignments with partially defined operators do no harm, i. e. definitely eval- 
uate to defined values. 

7.1.3 Notational Conventions 

Since the MEC under consideration is always obvious from the 

context, the following notions are understood relatively to this MEC, which 
helps to avoid a highly parameterised notation. 

Elementary Transformation Steps In accordance to the notation of the 
elementary transformations introduced in Chapter 6 we use a generic notation 
for the transformations of the underlying MEC {Tam,Tei), which refers to the 
respective type of assignment motions and eliminations under consideration: 

— G G' is used instead of G G' or G G' according to the 
type of assignment motions used in Tam and 

— G i-^Q G' is used instead of G G', G G' or G G' accord- 
ing to the type of eliminations that is responsible for the 7^;-transition 
from G to G'. 

Moreover, the notion G i — > G' is used as an abbreviation for any of the 
elementary transformations, i. e. 

G^G' 3 a G AV{G). G G' G Tam G G G' G Ti. 

Source- and Destination-Occurrences Let AM„ = G G' G Tam- As 
already mentioned. Definition 6.1.2 establishes a one-to-one correspondence 
between the original occurrences on a program path p G Pc[s,e] and the 
inserted occurrences on the associated path p' = € Pg' [s, e]. Therefore, 

we can define for ocCa,p G OcCa,p{G) and occ'^^ p, G OcCa,p’{G'): 

Gi"CiM^{occ'a^ p,) : The p-occurrence corresponding to occ'^ p, 

G^Gnfi^{ocCa,p) ■ The p'-occurrence corresponding to ocCa,p 

This notion can naturally be generalised to program occurrences. For occur- 
rences occa G OcCq(G) and occ^ G OcCa{G') we have: 

{oCCq, I OCCa,p = 5'rCAM„(oCC^_p,), occ'„_p, G OcCa,p'(G')} 

Dst ^ I p* — 5 OCCat p G OcC(y p(^G^^ 



Finally, for sets of a-occurrences Ma we define: 



U 'S'rCAM^(oCCa) 






Grc^yi^(Ma) 
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Dst^^^{Ma) y Dst^^^{0CCa) 

OCCa^Ma 

For convenience, we simply identify occa with Dstf^fi^(occa), if occa is not 
moved at all. Moreover, it is easy to extend the notions to elimination trans- 
formations in Tel yielding definitions for Dst^i^^(occa) and Src-gj^^(occa), re- 
spectively. The only difference is that Dst^j^^(occa) can be empty. Again 
post-elimination occurrences are identified with their corresponding original 
occurrence. Finally, the notions defined above can also be extended to se- 
quences of eliminations and assignment motions. Therefore, for transforma- 
tions TRj = G^-l I — > Gi e Tam ^ Tel, i = {I, ■ ■ ■ , k) wB USB TRi; . . . ; TRfe(G) 
as a shorthand for TRfc(. . . (TRi(Go)) . . . ). On the other hand, sequences 
of transformations are sometimes implicitly specified by using a notion 
TRi;... ;TRfc, which stands for a program transformation from Go to Gk 
being composed out of appropriate elementary transformations. In the light 



of this convention we 


; define: 






;TRfc (ocCq,) = 5rc^R^ ( . . 


•('S'rCTR,(0CCa))...) 


and 




;TRfc (oCCq) — UsGr^ ( . 


. .(UstTRi(oCCa))...) 



As a special case of the above notion we define the origins of an occurrence 
ocCa, for which the sequence of program transformations TRi; . . . ;TRfe(G) 
applied to the initial program G is understood. 

Orig{ocCa) =* -S'rcTRj.... - trj, (ocCa) 



Eliminability 

Finally, we introduce a predicate eliminahle{G , ocCa) that indicates, if an 
occurrence ocCa of an assignment pattern a is eliminable in G: 

eliminable{G , ocCa) ^ 3EL„ = G G' G Tei- Dst^i^^(occa) =0 

7.1.4 Uniform Motion-Elimination Couples 

As mentioned, MECs are mainly parameterised by their eliminations. In ad- 
dition, reasonable MECs like the ones presented in Chapter 6 have in com- 
mon that the eliminations involved are uniformly characterised by the set 
of eliminable occurrences, i. e. every subset of eliminable occurrences can be 
eliminated on its own. More formally, we define: 

Definition 7.1.2 (Uniform Motion-Elimination Couples). An MEG 

{Tam, Tel) is 0 Uniform motion-elimination couple (UMEC), if and only if 
for any a G AV(G) and M C {ocCa € OcCa{G) \ eliminable{G , ocCa)} 

3TRm G Tel V OCCa G OcCq,(G). (oCCq) = 0) OCCa G M 
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7.2 Confluence of Uniform Motion-Elimination Couples 

In this section we investigate under which circumstances different exhaustive 
iteration sequences with respect to a given UMEC lead to a unique result. 
This addresses the question of the confluence of the i — s-relation. Though we 
usually cannot assure true stabilisation, since local reorderings of independent 
assignments may lead to infinite sequences of assignment motions, we always 
have stabilisation up to local reorderings within basic blocks. Fortunately, we 
can abstract from these local reorderings, which allows us to prove confluence 
via the weaker property of local confluence. As the central result we provide 
a simple criterion for local confluence, consistency of a UMEC. 

7.2.1 The General Case 

In general UMECs are not confluent as the following example illustrates. To 
this end we consider a UMEC that interleaves 

— admissible assignment sinkings with 

— the elimination of locally redundant assignments. 

The elimination of locally redundant assignments is defined as follows: an 
occurrence occa of an assignment a. is locally redundant, if it is redundant 
in the sense of Definition 6.4.1, where in addition the program points pi 
occurring in this definition must all belong to the same basic block as h. 

Intuitively, the reason for non-confluence is that the application of elimina- 
tions may reduce opportunities for assignment motion and vice versa. This 
is illustrated in Figure 7.1 where two different maximal results are possible 
that cannot be subjected to any further transformation. 

It should be particularly noted that the resulting programs Gi and G 3 in this 
example are even incomparable in terms of the number of assignments being 
executed on program paths. For instance, in Gi there are two occurrences of 
the assignment x ■.= a -I- 6 on the path (1, 3, 5) as against only one in G 3 . 
For the path (2, 3,4), however, the situation is exactly contrary. 

7.2.2 Consistent Motion-Elimination Couples 

In this section we impose a condition on the interference of motion and elim- 
ination transformations that is sufficient to establish local confluence. This 
condition, which we will call consistency, essentially means that both, elimi- 
nations and assignment motions, preserve the potential for eliminations: 

Definition 7.2.1 (Consistent UMECs). A UMEC {TarmTei) is consis- 
tent, or for short a CUMEC, if for any a, fi G AV{G) (not necessarily 
different) the following conditions are satisfied: 
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Fig. 7.1. Non-confluence of the combination of assignment sinkings and locally 
redundant assignment eliminations 

1. Eliminations preserve the potential for other eliminations, i. e. for any 

elimination EL^ = G G' € T^i and every occurrence occa G 

OcCa{G) we have: 

eliminable{G , ocCa) V occ'^ G Dst^i^^{ocCa)- elimin able {EL 13 (G), occ'^) 

G' 

2. Assignment motions preserve the potential for eliminations, i. e. for any 
assignment motion AM 13 = G 1-^/3 G' G Tam and occurrence ocCa G 
OcCa(G) we have^ : 

^ Note that in contrast to point (1) this assumption is an equivalence. Implicitly, 
this means that non-eliminability is preserved at least partially. 
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eliminable{G , occa) 'i ocd^ G 



(occa)- eliminable{ kV[p{G) , occ'^) 



If only Condition 1 of Definition 7.2.1 is fulfilled the UMEC is called weakly 
consistent. 

Before we are going to present the main result of this section, namely that 
consistency implies local confluence, it should be noted that consistency does 
not force the stronger type of modular confluence as it is illustrated by means 
of VDC£ in Figure 7.2. Besides that, this example provides a deeper insight 
for the technically difficult proof of Case 6 in the forthcoming Theorem 7.2.1. 
The point of this example is that the dead assignment at node 2 can be 
subjected to both a dead assignment elimination as well as to an assignment 
sinking. Whereas in the latter case the moved assignment remains dead and 
can be eliminated afterwards, in the former case, however, this still requires 
to apply both an assignment sinking and a dead assignment elimination. The 
central result then is:^ 

Theorem 7.2.1 (Consistency Theorem). A CUMEC {Tam,Tei) is locally 
confluent, i. e. whenever two transformations TRi, TR2 G Tam U Tei are ap- 
plicable to a program Gq G then there exists a program G3 such that the 
diagram can be completed as shown belowf 




. G2 

^ \ / N 

\ ✓ 

\ / 

G3 

Proof. Without loss of generality we can assume that Tam refers to assign- 
ment sinkings. The proof proceeds by an exhaustive case analysis investigat- 
ing all possible choices for TRi and TR2. 

Case 1: TRi = Go Gi and TR2 = Gq G 2 . 

In this case condition (1) of Definition 7.2.1 ensures that the elimination 
potential for the o-eliminations of TRi is preserved by TR2 and vice versa. 



® In [GKL+96] an alternative proof of the confluence of 'P'DCE can be found, 
which is based on the concept of delay-monotonicity. 

^ Here and in following illustrations the “arrows” are labelled by their associated 
transformation names, which are emphasised by a grey circle. 
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Fig. 7 . 2 . Non-modular confluence of PDCE 



Since the MEC is assumed to be uniform, there are elimination transfor- 
mations TR3 and TR4 that eliminate exactly the remaining a-occurrences 
that are also eliminated by TR2 and TRi, respectively. In fact, there is 
even a “direct” elimination transformation from Gq to G 3 . 
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Case 2: TRi = Go Gi and TR 2 = Go G 2 . 

An assignment motion TR5 that dominates both TRi and TR2 can be 
constructed in the following way: TR 5 (Go) results from Go by removing 
every original occurrence of a and inserting new occurrences at program 
points where justification with respect to the original occurrences would 
get violated by further movement, i.e. 

TR^-Insert^ OrigJust!^ A {Blocked^ V 3m G succ{h). ^OrigJust^), 

where OrigJust^ is a slight modulation of the justification predicate of 
Definition 6.1.2. As opposed to justification this predicate is not parame- 
terised by an assignment sinking, but rather takes all original occurrences 
into account. 

OrigJust^ ^ 

Vp G P[s, n] 3 1 ^ f < |p|. AssOcCp. A W i < j < \p\. -^Blockedp. 

It can easily be checked that TR5 defines an admissible assignment sinking. 
Moreover, we can establish assignment sinkings TR3 and TR4 such that 
the diagram below commutes: 




Case 3: TRi = Go Gi and TR 2 = Go '-^/3 G 2 , a ^ p. 

In this case condition (I) of Definition 7.2.1 ensures that the elimina- 
tion potential for /3-eliminations is preserved by TRi, while the potential 
for a-eliminations is preserved by TR 2 . Exploiting uniformity we have 
straightforward elimination transformations TR3 = Gi >-^/3 G 3 and 
TR4 = G 2 i-^a G 3 completing the diagram in a modular way: 
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Case 4: TRi = Go G\ and TR 2 = Gq G 2 , a f3. 

If a and (3 are non-blocking, modular confluence follows trivially. Oth- 
erwise, the definition of admissible assignment motions (cf. Definition 
6.1.2) ensures that no instances of /3 are inserted into the region of TRi- 
justified program points. Hence TRi and TR 2 can straightforwardly be 
sequentialised in an arbitrary order. 




Case 5: TRi = Gq 1 — 

Due to assumption (2) in Definition 7.2.1 assignment motion TRi pre- 
serves the opportunities for /3-eliminations. Hence we can choose TR 3 = 
Gi 1-^/3 G 3 as the transformation that eliminates exactly the Gi- 
occurrences in Dst-Y^^{occp)^ where the corresponding Go-occurrences 
occp are eliminated by TR 2 . The argument for the construction of TR 4 = 
G 2 G 3 is similar to the one in Case 4. Obviously, /3-eliminations do 
no add blockades within the range of TRi-justified program points. 




Case 6: TRi = Gq Gi and TR2 = Go G2. In this case the situation 
gets more intricate. Here we are going to show that there are a program 
G '2 and transformations 

TR3 = Gi G3 

TR4 = G2 G'2 

TRs = G'2 G3 
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such that the diagram as displayed below commutes (for a concrete ex- 
ample reconsider Figure 7.2): 




In the following we are going to elaborate the construction of these trans- 
formations in detail. Two sets of a-occurrences play a central role: 

— The set of CJo-occurrences of a that are not eliminated by TR 2 

S '*= {oCCq, € OcCa(Go) I Dstji^^(oCCa) ^ 0} 

— The set of Gi-occurrences of a whose source occurrences do not par- 
ticipate in TR 2 . 

T {oCCa € OcCa(Gi) | SrC-j-j^^ (oCCa) C S} 

As a first result we have 

OCCa G OcCa{Go). OCCa ^ >5 => eliminablc{Go, ocCa) (7.1) 

V OCCq, G OcCq(Gi). OCCq ^ T => elimmable{Gi^ ocCq) (7.2) 

Implication (7.1) is trivial. To show also implication (7.2), consider a 
Gi-occurrence ocCq ^ T. According to the construction of T there exists 
at least one ocCq G Src-Yj^^(occa) such that ocCq ^ S. Due to (7.1) this 
implies eliminable{Go, occ'^). Using the =>-direction of assumption (2) in 
Definition 7.2.1 together with the fact that ocCq G DsGri(ocCq) finally 
delivers eliminable{Gi, ocCq). 

Then TR 3 , TR 4 and TR 5 are constructed as follows: 

Construction of TR 3 : Elimination transformation TR 3 is easy to describe: 
it eliminates all occurrences of a that are not elements of T. This 
construction is sound due to (7.2) and the assumption of uniformity. 
Construction of TR 4 : This is the crucial point in the construction of this 
case. As a first step let us identify S and T with their corresponding 
program points in G 2 . Then / shall denote the set of program points 
in between S and T : 
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fi G / G P. Pi G S' A P|p| GT A31 ^j< \p\. = fi 

Informatively, I is the set of intermediate program points along which 
a relevant TRi-movement is propagated, i. e. a movement that is not 
just the result of a propagated elimination. Now TR 4 is determined 
through the transformation that 

— removes all S-associated occurrences of G 2 

— inserts occurrences at program points n satisfying 

h G T or (7.3a) 

h ^ I A 3m G pred{h). m G I (7.3b) 

Hence TR 4 coincides with TRi for the T-associated insertions, while in- 
troducing new insertions at program points leaving I without reach- 
ing T. This situation is roughly sketched in Figure 7.3(a). 




Fig. 7.3. (a) Illustrating the construction of TR 4 (b) Justification of 
insertions at the border of I 



Next, we have to convince ourselves that TR 4 defines indeed an ad- 
missible assignment motion. First, the removed occurrences of a are 
substituted, since every path leading from a site with a removed S- 
associated occurrence eventually passes an insertion site of TR 4 . 
Thus it is only left to show that the inserted occurrences are justified 
as well. For the T-stipulated insertions justification simply carries 
over from the admissibility of TRi.® To show that the insertions at 
the border of / are justified as well it is sufficient to prove that 
Condition (7.3b) already implies Vm G pred{h). m G I, which grants 
that h can only be entered from the inside of I. To this end, let us 
assume a program point h that meets the requirement of Condition 
(7.3b). By definition h has a predecessor m that is element of I . By 
the construction of I this implies that m has another successor n' 

® Note that by construction the insertions at the border of 7 do not introduce 
blockades for the T-based insertions. 
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that is element of / U T. This, however, excludes that m has other 
predecessors besides ft, since this would imply that the edge (m, ft) 
would be a critical one (see Figure 7.3(b) for illustration). 

Construction of TR5 : Like TR3 the transformation TR5 is simply deter- 
mined to eliminate all occurrences of a not being contained in T.® 
To prove that this definition is sound we have to show that any oc- 
currence not in T can be eliminated in G2. Due to the assumption 
of uniformity this means that we are left to prove that 

Vocca G OcCa(G'2). ocCa ^ T ^ eliminable{G2, occa) (7.4) 

To this end, we introduce an auxiliary program G4 which results 
from G2 by reinserting the occurrences that have been eliminated by 
means of TR2. In fact, we can establish some relations between G4 
and the programs as constructed before, which are shown below. 




We now have to take a close look at these transformations in order 

to justify that they are actually well-defined: 

— TRg = Go i-^a G4: the insertions and removals of TRe are cho- 
sen exactly as in the construction of TR4, which is sound as this 
transformation solely depends on source occurrences of S. 

— TR7 = G4 i-^a Gi: Gi coincides with G4 for the insertions be- 
longing to T. Moreover, no admissible assignment motion with 
argument program Go and T-associated insertions can also have 
insertions within the range of I as introduced in the construction 
of TR4. Therefore, those insertions in Gi that do not belong to T 
are justified with respect to the G4-occurrences that do not belong 
to T. 

— TRg = G4 G'2- this transformation eliminates all occurrences 
being reinserted into the program of G2. Due to uniformity it is 

We identify the Gi-occurrences of T with their counterparts in G 2 . 
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enough to show that any reinserted occurrence occa € OcCa(G 4 ) is 
eliminable. By construction, ocCq, is also a Go-occurrence satisfying 
eliminable{Go, ocCa). Then consistency and the admissibility of 
TRg yield that also eliminable {G 4 , ocCa) must hold. 

For the proof of Property (7.4) let us assume an occurrence ocCa € 
OcCa(G 2 ) \ T. Since ocCa is also a G 4 -occurrence of a, the fact that 
ocCa is an occurrence outside of / delivers: 

Dst-Yj^^{ocCa) n T = 0 (7-5) 

Then the proof can be completed as follows: 

[Property (7.5)] V occ'^ G Dst-Yf^^{ocCa)- occ'^ ^ T 

[Property (7.2] V occ[j G Dst-YY^^{ocCa)- eliminable {Gi, occ'^) 

[Def. 7.2. 1(2)] eliminable{G 4 , ocCa) 

[Def. 7.2. 1(1)] eliminable{G' 2 , ocCa) 

Having defined the transformations it is now easy to see that both se- 
quences 

1 . TRi ; TR 3 and 

2. TR2;TR4;TRs 

result in the common program G 3 , which evolves from Gg by removing 
all original occurrences, while inserting new occurrences corresponding 
to T. □ 

7.2.3 Confluence of Consistent Motion-Elimination Couples 

So far we established local confluence for CUMECs. According to Newman’s 
Theorem, however, local confluence implies confluence only for terminating 
relations [New42]. Unfortunately, assignment motions are only terminating 
up to local reorderings of independent, i. e. non-blocking, assignments inside 
of basic blocks. Nonetheless, in this section we demonstrate that for our 
application local confluence is sufficient, since the reasoning on confluence 
can be transfered to equivalence classes of programs abstracting from local 
reorderings. To this end we introduce the following equivalence relation on the 
occurrences of assignment patterns, which reflects the similarity of programs 
up to local rearrangements. 

Definition 7.2.2 (Equivalent Programs). For an MEG (Tam,Tei) the 
equivalence of programs = C g"© x JJ® is defined by: 

G = G' @ G I — G' A G' I — G 

Since by definition neither eliminations nor global assignment motions are 
reversible, equivalent programs are particularly easy to characterise: 
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Lemma 7.2.1 (First =-Lemma). Let {TamjTei) he an MEC and G, G' € 
S'® such that G = G' . Then G and G' only differ in the local ordering of 
occurrences of (non-blocking) assignment patterns within basic blocks. 

Using this property we can easily see that elementary transformations are 
compatible with = unless the potential of eliminations is affected by local 
reorderings. Hence: 

Lemma 7.2.2 (Second =-Lemma). Let {Tam,'^ei) be a weakly consistent 
UMEC and G, G' , G" G S®- Then 

G = G' A G I — >G" ^ 3 G'" G S®- G' i — > G"' A G" = G'" 

As a consequence of Lemma 7.2.2 we may restrict our reasoning on i — > to 
equivalence classes of programs, since the relation induced by i — > does not 
depend on the particular choice of a representative. Denoting the equivalence 
class of G with respect to = by [G]^ we define: 

Definition 7.2.3 (o — >). Let (Tam,T'ei) be an MEC. The relation o — > C 
S'®/— X S®/— is defined by: 

[G]^ [G% G I — >G' A G^G'. 

Note that the term G ^ G' forces irreflexivity of o — >, which is important 
for termination. Then Lemma 7.2.2 provides the following correspondence 
between i — > and o — s-: 

Lemma 7.2.3 (o — >-Lemma). For a weakly consistent UMEC {TamjTei) 
we have 

V G, G' G S®. G I — G' 4A [G]^ [G']^ 

An immediate consequence of this lemma is: 

Corollary 7.2.1. Let {Tam,Tei) be a weakly consistent UMEC. Then 

I — > locally confluent o — > locally confluent 

Combining the previous results and using the obvious termination of o — > we 
finally obtain as the main result: 

Theorem 7.2.2 (Confluence of CLFMECs). 

For a given CUMEC {'Tam,'Tei) the relation i — > is confluent. 

Proof. Let us assume a situation as sketched in Figure 7.4(a), where programs 
Gi and G 2 result from Go by the application of an arbitrary number of 1 — >- 
steps. Then we have to show that there is a program G 3 which completes the 
diagram as displayed in Figure 7.4(a). 




168 



7. A Framework for Assignment Motion Based Program Transformations 








Fig. 7.4. Illustrating 
confluence 



Due to Theorem 7.2.1 consistency implies local confluence of i — >. According 
to Corollary 7.2.1 this also establishes local confluence of the corresponding 
relation o — >. Since o — > is terminating, o — > is also confluent according to 
Newman’s Theorem [New42]. This can be used in order to complete the di- 
agram of Figure 7.4(a) along the following lines. Due to Lemma 7.2.3 the 
initial situation^ of Figure 7.4(a) carries over to a corresponding initial situa- 
tion in Figure 7.4(b), where [Go]^o — and [Gq]^o — >[G 2 ]^ hold. Using 
the confluence of o — > there is a class [Ga]^ € such that the diagram of 

Figure 7.4(b) commutes. Applying Lemma 7.2.3 once more yields also com- 
pletion of the diagram in Figure 7.4(a). □ 

The Universe of an MEC The latter theorem particularly gives rise to a 
characterisation of maximal programs in a universe that can be reached from 
an initial program. Therefore, we define: 

Definition 7.2.4 (G-Universe of an MEC). The G-Universe of an MEC 
M is given by 

il3^(G) =" {G' G ;?0 I G G'} 

Using Theorem 7.2.2 it is easy to see that maximal elements of the universe 
are unique up to local reorderings: 

Corollary 7.2.2. Let {'Tam,T'ei) be a CUMEC. Then all programs being max- 
imal up to local reorderings in ii-M(G), i. e. programs G' such that 

VG" G S'®. G' I — > G" ^ G' = G", 



are =-equivalent. 

Obviously, for a given MEC A4 the relation i — is a preorder on ii- 3 vt(G) 
whose kernel is given by =. In addition, we also have preorders to measure 
the quality of programs within ilM(G). In accordance with the Definition of 
in Section 3.2.1 and the Definition of in Section 4.2.2 we introduce 
a “computationally better” preorder which, for G',G" G ii-M(G), is defined 

by:« 

^ That is the non-dashed part. 

® For G' G il 3 vt(G) we denote the path in G' that corresponds to p G Pg[s, e] by 
Pc- 
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• G' Cl" , if and only if for any p G Pg[s, e] the number of computations 

occurring on pc is equal or less than the number of computations on pc'-^ 

Analogously, also an “assignment-better” preorder can be defined: 

• G' ^ass G" , if and only if for any p G Pg[s, e] the number of assignments 
occurring on pG" is equal or less than the number of assignments on pc"-^^ 

Finally, we would like to point out a quite obvious result that applies to 
preorders that harmonise with i — 

Lemma 7.2.4 (Optimality Lemma). Let A4. he an MEG and C 

S'© a preorder such that i — > C A. Then, if G' is i — -optimal in ii-M(G); 
this also implies that G' is '^-optimal in 



7.2.4 The Applications 

In this section we apply the results of Section 7.2.2 and Section 7.2.3 to the 
MFCs presented in Chapter 6. Here we have: 

Theorem 7.2.3 (Confluence of T’DCS, TTCS and VTtAS). 

1. VDC6 is a confluent UMEC. 

2. "PJ-CS is a confluent UMEC. 

3. PTLA£ is a confluent UMEC. 

Proof. All proofs are applications of Theorem 7.2.2. Obviously, PT>CS, 
VTCS and VTlAS are UMECs. Therefore, it is left to show that in each 
case consistency is fulfilled. 

Consistency of PT>CS'. Let us consider an occurrence occa of an assignment 
pattern a = x \= p at a program point h of G. Starting with the first 
condition of Definition 7.2.1, we have to show that deadness of ocCq, 
is preserved, if the occurrence survives a dead assignment elimination. 
Therefore, let us assume a dead assignment elimination ELp = G 
G' G T>A£ with respect to the assignment pattern fl = y := that does 
not eliminate ocCq,. According to the definition of deadness (cf. Definition 
6.2.1) on every path leading from n to e a use site of x must be preceded 
by a site at which x is modified. More specifically, this means: 

Vp G P[n,e], 1 < i ^ |p|. LhsUsedp. 3 1 < j < z. LhsModp. (7.6) 

We have to show that this condition still holds after applying the elimina- 
tion transformation EL^. Therefore, let us assume a path p G P[h, e] with 

® The number of computations is determined by the number of operator symbols. 
The number of assignments is determined by the number of operator symbols. 
Informatively, this means that no elementary step can impair the quality of 
programs. 
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a use site at node pi (1 < z ^ |p|) such that LhsUsedp. holdsd^ According 
to Condition (7.6) there exists an index 1 < j < z such that pj satisfies 
LhsModp.. Let us choose j as the greatest one among all such indices. 
Obviously, then pj is associated with an assignment pattern x \= . . . , 
which is not dead, since it is used at point pi without a modification 
of X in between. Hence this occurrence is not eliminated by means of 
EL^, which finally completes the argument, as it forces the preservation 
of ocCq’s deadness. 

Next the second condition of Definition 7.2.1 has to be proved. Therefore, 
let us assume an admissible assignment sinking ASp G AAS, and show 

i) that all occurrences in Dstf^^^{ocCa) are dead, if ocCq is dead and 

ii) that at least one occurrence in (ocCq.) is not dead, if occa is 

not dead either. 

Since the line of argumentation is similar to the first part of the proof, we 
concentrate on sketching the essentials rather then going into technical 
details. Starting with (i) let us assume that ocCa is dead at program point 
h. Without loss of generality let us further consider a program path from 
ft to e with a use site of a; at a point strictly following ft. According 
to (7.6) the left-hand side variable x has to be modified in between ft 
and this use site. However, a modification site of x particularly means 
a blockade for the movement of ocCa- Since this blockade is still present 
after application of deadness is preserved for all occurrences in 

Dstuspiocca). 

Point (ii) is proved analogously. Assuming that ocCa is alive yields that 
ft is followed by a use site of x on some program path p such that no 
modification of x occurs in between. Since the instruction at the use site 
especially blocks ocCa, this implies that some occurrence in Dstn^p{ocCa) 
that lies on p/,sp is still alive. 

Consistency of T’^CS'. In the case of T’^CS the first condition of Defi- 
nition 7.2.1 is proved almost identically as the counterpart of 7^X>C5. 
Assuming a faint assignment a = x := (p at program point h, the only 
essential difference is that we are faced with an additional case according 
to Definition 6.3.1. Hence the counterpart of Condition (7.6) now reads 
as: 



VpG P[fz,e] VI < z ^|p|. LhsUsedp. 

(31 < j < i. LhsModp.) V Faintp. 



(7.7) 



The case in which there is no use site on path p trivially carries over to G' . 
That means, if occj^p denotes the path-occurrence that blocks the path occur- 
rence occp^p, then DstT^(ocCj^p) still blocks DstjAoccp^p) and both occurrences 
precede any use site of x on the path pas^ . 
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with j3 addressing the occurrence associated with node pi. In fact, in 
the case that part (i) of the conclusion in (7.7) applies, the argument is 
exactly the same as in the case of T’TyCS. Otherwise, also part (ii) can 
do no harm, because the elimination of the faint occurrence of [3 at pi 
could at most reduce the cases in which the premise LhsUsedp. is true. 
Hence the implication remains valid after performing a faint assignment 
elimination. The second condition of Definition 7.2.1 is proved completely 
along the lines of the VDCS-case. 

Consistency of VTZAS: Starting with the first condition of Definition 7.2.1 
let us assume an occurrence occa of a at a program point h being redun- 
dant. According to Definition 6.4.1 this means: 

Vp G P[s, h] 3 1 ^ i < |p|. AssOcc“ A 

y i < j < \p\. ^LhsModp. A Vi < j ^ |p|. -^RhsModp. (7.8) 

' V ^ 

(★) 

Let us now assume a redundant assignment elimination EL^ = G 
G' G TZAS. Then we have to show that ocCa remains redundant, if it 
survives the application of EL^. Since EL^ does not add assignments. Part 
(★) of (7.8) is preserved on every path. Thus only the case, where an 
a-occurrence at node pi is eliminated is of interest. In this case, however. 
Condition (7.8) applies equally to pi in place of h, which establishes 
redundancy of ocCa in G', too. 

Similarly, the =^-direction of the second condition of Definition 7.2.1 fol- 
lows immediately from the path characterisation in (7.8) using the fact 
that the p^-occurrences of a are blockades of ocCa, a fact which is pre- 
served for the destination-occurrences during assignment hoisting. Com- 
plementary, by means of (7.8) it is easy to see that non-redundancy of 
ocCa is preserved at least on one program path, which finally also pro- 
vides the <J=-direction of Definition 7. 2. 1(2). □ 



7.3 The Complexity of Exhaustive Iterations 

In the previous chapter we gave a sufficient condition for the confluence of 
a UMEC. Since i — > is terminating up to local reorderings, the optimisation 
potential can always be exhausted completely. However, we did not yet an- 
swer the question what is the cost of a suitable exhaustive iteration sequence. 
Although for partial redundancy elimination exhaustive approaches based on 
assignment motion have been proposed and implemented before, no serious 
estimation of their complexities have been given. Dhamdhere [Dha91] briefly 
discusses assignment hoisting as an instrument to enhance partially redun- 
dant expression elimination, but does not address its complexity. Similarly, 
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Dhamdhere, Rosen and Zadeck [DRZ92] discuss the presence of second or- 
der effects in an algorithm for partial redundancy elimination that employs 
sparse slot- wise iteration techniques as opposed to bit- vector iterations. How- 
ever, they neither provide an argument for stabilisation of the process nor 
address the run-time behaviour. Besides, there are a few implementations that 
report the usage of an iterated version of Morel’s and Renvoise’s algorithm 
[MR79]. These implementations employ a restricted kind of assignment hoist- 
ing based on the idea that large expressions are decomposed following a strict 
naming discipline: the same right-hand side expressions are always assigned 
to a unique symbolic register. This way expression motion of large expres- 
sions is simulated by assignment motion, a phenomenon we will also exploit 
by our algorithm. The most prominent examples are probably implementa- 
tions that are based on the work of Chow [Cho83]. Moreover, Rosen, Wegman 
and Zadeck [RWZ88] also mention the PL. 8-compiler of IBM [AH82] as an 
implementation that iterates Morel’s and Renvoise’s algorithm. In fact, for 
acyclic control flow they even state a linear asymptotic upper bound on the 
number of iteration steps, which, however, is not further elucidated. 

Besides there are alternative approaches based on special intermediate rep- 
resentations of dependencies like SSA-form [RWZ88, BC94], the program de- 
pendency graph [FOW87, FKCX94], and the dependency flow graph [JP93]. 
Common to all approaches is that they are limited in capturing second order 
effects. For example, the SSA-based approaches of [RWZ88, BC94] introduce 
rankings among the occurrences that are moved, which only guarantees a 
reasonable ordering for acyclic programs. 

In this chapter we examine the complexity of exhaustive iteration sequences 
in a UMEC. Essentially, there are two levels of iteration that contribute to the 
overall transformation. The inner level is caused by the fixed point iterations 
of global data flow analyses which are employed for gathering information 
to be used for the component transformations. In contrast, the outer level 
is forced by the exhaustive application of the elementary transformations. 
Whereas the complexity of the inner iterations is well-understood (cf. Section 
4.6), the complexity of the outer iteration process has not been investigated 
yet. 

In this chapter we therefore consider the penalty costs of the outer iteration 
process, i. e. the additional factor that is caused by the outer iteration com- 
pared to a one-step application of the component transformations. In essence, 
a one-step application means the simultaneous application of a certain type of 
transformations to all code patterns of interest. It turns out that a quadratic 
penalty factor always succeeds. Nonetheless, such a compile-time penalty is 
hardly acceptable in practice. However, most practically relevant UMECs do 
stabilise considerably faster. In fact, for T’TfCS which is implemented in Re- 

Unfortunately, this does not seem to be reported in a first-hand reference. 
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lease 4 of the Sun SPARCompiler^® measurements indicate that the average 
iteration count is only 4 to 5 [Gro95].^® In this chapter we present the the- 
oretical foundation giving evidence for this observation. All our applications 
are shown to fall into a class of UMECs where the penalty factor is linear 
and can even be expected to be reasonably small in practice. The central 
property that characterises this class is that all elimination-elimination ef- 
fects are restricted to initial ones. Surprisingly, as opposed to the inner level 
of iteration the penalty factors are independent from the branching structure 
of a program, i.e. there is no difference in the iteration behaviour of simple 
straight-line programs and intricate irreducible programs. 

7.3.1 Simultaneous Iteration Steps 

To give a reasonable estimation on the costs that are associated with an 
exhaustive iteration sequence, we have to choose the basic components ap- 
propriately. In practice, the elementary transformations are typically imple- 
mented by means of bit- vector algorithms (cf. Section 4.6). Nonetheless, even 
for non-bitvector analyses like the elimination of faint assignments we con- 
sider iteration steps that apply elementary transformations with respect to all 
assignment patterns. In a more abstract view we impose two requirements 
reflecting the sketched situation: 

— The elementary transformations have to be reasonably powerful in order 
to avoid unnecessarily slow convergence. 

— All transformations of a given type are assumed to be performed simulta- 
neously for all assignment patterns of a program. 

More formally, the first requirement is captured by the usage of maximal 
component transformations which, at least for the class of UMECs, are guar- 
anteed to exist. 

Maximal Trausformatious Let {T'am^T'ei) be a UMEC and a £ AV{G). 
A maximal assignment motion AM^(G) £ Tam and a maximal elimination 
ELJJ £ Ti with respect to a are determined in the following way: 

1. AM)J(G) is constructed as described in Case 2 of the proof of Theorem 

7.2.2. 

2. EL^(G) results from G by eliminating every G-occurrence occa of a sat- 
isfying eliminable{G , ocCa). 



SPARCompiler is a registered trademark of SPARC International, Inc., and is 
licensed exclusively to Snn Microsystems, Inc. 

That means 4 or 5 iterations of both assignment motion and eliminations. 

In practice, one can speed up the simultaneous iterations by identifying those 
patterns that are candidates for a successful transformation in advance. However, 
this does not reduce the costs in terms of worst-case computational complexities. 




174 7. A Framework for Assignment Motion Based Program Transformations 



It is easy to see that both AM^(G) and EL^(G) are well-defined, i.e. G 
AM^(G) G Tam and G EL'J(G) G Tel. Obviously, AM^ and EL^ can be 
regarded as functions from 3"0 to jj®- 

For our reasoning only one property of maximal transformations is of rel- 
evance. If the maximal transformations get stuck any other transformation 
gets stuck as well. More specifically, we have: 

Lemma 7.3.1 (Stability Lemma). Let {Tam,Tei) be a weakly consistent 
UMEC. Then 

1. AM(((G) = G ^ VAM„ = G G' G Tam- AM„(G) = G 

2. EL^{G) = G => VEL„ = G G' G Ti. EL„(G) = G 

Simultaneous Transformations In the proof of Case 3 and Case 4 of 
Theorem 7.2.1 we saw, how in a weakly consistent UMEC two transforma- 
tions of the same type operating on distinct assignment patterns can be 
performed in arbitrary order without affecting each other. This construction 
can straightforwardly be extended to finite sets of elementary transforma- 
tions. Let {Tam, Tel) be a weakly consistent UMEC, AV{G) = {ai, . ■ ■ ,ctk} 
with fc ^ 1 and , . . . , G Tam or TR„j , . . . , TRc^ G Ti a family of 
transformations operating on the argument program G. Then the program 
that results from a simultaneous execution is denoted by: 

(TR„, II ... |j TR„J(G) 

Our reasoning again only uses one property of this definition, namely that the 
simultaneous executions can be sequentialised in an arbitrary order. Under 
the situation of the above definition this means: 

Lemma 7.3.2 (Sequentialisation Lemma). Let i G {1, ■ . . ,k}. And tt be 
a permutation of {1, .. . , A:}. Then there are transformations TR(,,^ , ■ . . , TR^,_ 
such that 

(TR„, II ... II TR„J(G) = TR^^^^^;... ;TR^^^^^(G) 

In particular, the first transformation coincides exactly with its component 
transformation, i. e. = TRq^jj, . 

Now we can define the components of the exhaustive iteration process: 

Definition 7.3.1 (Simultaneous Iteration Steps). 

Let {Tam, Tel) be a weakly consistent UMEC and AV{G) = {ai, . . . , Ofc}. 

1. A maximal simultaneous assignment motion step with respect to G is 
defined by 

AM|f(G) "= (AM((^ II ... II AM'(J(G) and 
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For assignment motions we choose the program, where insertions of independent 
assignments at the same program point are ordered in accordance to their indices. 
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2. A maximal simultaneous elimination step with respect to G is defined by 
EL|f(G) ||...|lELjiJ(G) 



7.3.2 Conventions 

For the sake of presentation the reasoning on complexity is exemplified for 
MFCs with assignment sinkings. 

As noted in Section 7.2.3 iteration sequences only stabilise up to local reorder- 
ings within basic blocks. Therefore, the term stabilisations always addresses 
stabilisation modulo program equivalence as introduced in Definition 7.2.2. 
This means, a program G is stable up to local reorderings, if and only if for 
any elementary step G i — > G' being applicable G A G' is satisfied. Note that 
in the case of consistent UMECs this always guarantees that the result of 
this process is unique up to local reorderings (cf. Corollary 7.2.2). 

The complexity of an exhaustive iteration sequence with respect to an under- 
lying MEC is stated in terms of a penalty factor that captures the additional 
costs in comparison to a single application of a simultaneous iteration step. 
A factor that is of particular importance is the length of a maximal chain of 
blockades that occurs in a program: 

Definition 7.3.2 (Chain of Blockades). Letp (pi, . . . ,pk) € A 

(forwards directed) chain of blockades on p is a subsequence {pi ^ , • ■ . , Pv) of 
p such that for any 1 ^ j < r the nodes pi^ and Pq+i are associated with 
assignment patterns a and /?, respectively, such that 

— P blocks a and 

— the subpath {pi. -|- 1, . . . ,Pv) does not contain an immobile instruction like 
out{(p) that blocks a. 

For MFCs with assignment hoistings chains of blockades are defined dually. 
Let us denote the length of the longest chain of blockades on a path p by 

A block 

In contrast to the presentation up to this point the reasoning on the com- 
plexity takes advantage from the basic block view of a flow graph, since the 
general structure of the basic block flow graph does not change throughout 
the repeated application of transformation steps. 

7.3.3 The General Case: Weakly Consistent UMECs 

First, we are going to examine the convergence behaviour under the most 
general assumption that guarantees the existence of simultaneous iteration 
steps, that is weak consistency. 



® i. e. a path with elementary nodes. 

Actually, the direction of assignment motion is only relevant for the condition 
on the absence of immobile blockades. 
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The Upper Bound Even under this assumption the iteration process is 
guaranteed to stabilise within a quadratic number of iteration steps. Essen- 
tially, this is due to the fact that every basic block n in the flow graph can 
“absorb” no more assignments than there are on any path leading from s to 
n. This gives rise to the following definition:^^ 



A block 



def 



min 

p G P[s,Zast(n)] 



^block 



max 

n G N 



y^block 



In particular note that the value of is determined by an acyclic path. 

Since the blockade relation among moved occurrences is preserved, there are 
at most occurrences that are moved to the basic block n during the 

exhaustive iteration including the assignments at n. Hence the overall num- 
ber of initial assignments and of assignments that enter a new basic block 
due to assignment motion can be estimated by |N| Since every simul- 

taneous elimination and assignment motion step influences at least one of 
these assignments before stabilisation we have: 

Theorem 7.3.1 (A General Upper Bound). For every weakly consistent 
UMEC an alternating sequence of simultaneous elimination and assignment 
motion steps, i. e. the iteration sequence (EL|^; stabilises within 

2 |N| 

of both kind of simultaneous iteration steps, i. e. 
is stable up to local reorderings. 

The Lower Bound Unfortunately, the estimation given in Theorem 7.3.1 
would mean a compile-time degradation considered unacceptable in practice. 
Whereas the factor can be expected to be reasonably small in practice, 

though in the worst case it may even reach |Occ^-p(G)|, the more serious 
inefficiency is given by the factor |N|. Unfortunately, a linear dependency on 
this parameter may even show up for simple acyclic programs. In the following 
we show that the upper bound given in Theorem 7.3.1 is asymptotically 
tight. To this end, we reinvestigate our example that combines assignment 
sinking with the elimination of locally redundant assignments (cf. Section 7.2, 
page 157). Figure 7.5 provides a scheme of programs, where any stabilising 
iteration sequence is of quadratic order. 



Again the dual notions for MFCs with assignment hoistings are straightforward. 
In practice, one additional step of both kinds of transformations may be required 
in order to recognise stabilisation. 
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Fig. 7.5. Illustrating slow stabilisation 



Essentially, this program scheme is built of the start block 1 being followed 
by a sequence of k ^ I identical program parts. 

The point of this example is that the sequence of blockades in the first node 
can successively be sunk to the entry of basic block 3i+2 with 0 ^ i ^ k, 
each movement requiring three simultaneous assignment motion steps. 

Figure 7.6 displays the situation after the first two iteration steps; two 
assignments have been moved from block 1 to block 3 and to the entry 
of block 2, respectively, where in the latter case they are blocked by the 
assignment x := a+b. In the next iteration step the occurrence of x := a+b 
at block 1 can also be sunk enabling the elimination of the original occurrence 
of a; := a + 6 at this site, since it becomes locally redundant. In fact, as 
displayed in Figure 7.7 the assignment sequence has moved from block 1 
to block 2 and 3, where essentially the same procedure applies in order to 
resolve the next blockade at node 5. 

In the program scheme of Figure 7.5 the value of is 4 being determined 

by the chain of blocking assignments on the path (1,2). Hence by increasing 
the number of blockades in node 1 we can easily manage to set to any 

value we want to. Thus we have: 

Theorem 7.3.2 (A General Lower Bound). There is a weakly consis- 
tent UMEC and a family of acyclic programs where Gij has i 

basic blocks and = j such that the number of simultaneous iteration 

steps that are necessary in order to reach a stable solution is of order fi{i j). 
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In this example assignment sinking cannot take advantage of the simultaneous 
transformations; at most one assignment pattern can be moved at a time. 
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Fig. 7.6. After the second simulta- Fig. 7.7. After the fourth simulta- 
neous iteration step neons iteration step 

7.3.4 Consistent UMECs 

Fortunately, the “odd” behaviour of the previous section does not occur in 
practically relevant, and thus consistent UMECs. In the following we are 
going to provide two additional conditions ensuring that a CUMEC sta- 
bilises significantly faster. The bounds are given in terms of a linear param- 
eter. In fact, we get completely rid of the “bad” parameter |N|, while only 
slightly increasing the second parameter The most striking advantage 

of CUMECs is the fact that their motion-elimination effects are particularly 
simple: assignment motion only increases the elimination potential for assign- 
ments that are actually moved. In the light of this property the pathological 
behaviour of the example from Figure 7.5 to Figure 7.7 is excluded, since this 
example is based on the effect that the “unmoved” occurrences of a; := a + b 
become eliminable through other movements causing late resolution of block- 
ades. 

7. 3.4.1 Absence of Elimination-Elimination Effects. First we will ex- 
amine the situation, where a CUMEC is free of any elimination-elimination 
effect (EE-effect), i. e. for any elimination EL,g = G <-^i 3 G' and any occur- 
rence occa G OcCa{G) there holds: 

eliminable { EL (G), occ a) eliminable{G, ocCa) (7.9) 

G' 

In UMECs without EE-effects the number of iterations mostly depends on 
the assignment motions involved. However, even in this case an improved 
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upper bound compared to the situation in Section 7.3.3 is not obvious, 
since all other second order effects, i.e. motion- motion, motion-elimination, 
elimination-motion effects (cf. Section 6.2.3), are still present. It should also 
be noted that the consistency requirement cannot be omitted, since the ef- 
fect discussed in Figure 7.5 to Figure 7.7 does not depend on the presence of 
EE-effects. 

The key for proving our new bound is a projection from the iteration be- 
haviour to the syntactical structure of the original program. The point is to 
show that code being moved within the t-th iteration is caused by an acyclic 
chain of blocking assignments in the original program with length greater 
than i. 

Lemma 7.3.3 (Iteration Lemma for EE-Effect Free CUMECs). 

Let us consider a CUMEC without EE-effects and an iteration sequence 

(ELj[; AMpXG), 0 ^ t ^ r 
Gi 

Let z ^ 1 and a be movable across the entry of a basic block n within the 
i-th, z ^ 1, assignment motion step, i. e. we have OrigJustff where h is the 
entry point of the node first{n) in the intermediate program EL|^(Gi_i). 

Then there exists a path p G P(3[s,/irst(n)[ such that an acyclic prefix of p 
contains a chain of blockades^'^ that 

— starts with an occurrence in Orig{ocCa) and 

— is at least of length i. 

Proof. We proceed by induction on z. The induction base z = 1 is trivial, 
since on every path p G PELjj'(G)[®7/*^'St(n)[ the substituted occurrence of a 
itself defines a chain of blockades of length 1 . 

Thus let us assume that z > 1 and an occurrence of a that is moved across the 
entry of n within the z-th simultaneous assignment motion step. According 
to the definition of admissible assignment sinkings^® there is an occurrence 
of a on every path leading from s to the entry of n as shown in Figure 7.8. 
Due to the maximality of the assignment sinkings (cf. Section 7.3.1) the 
situation immediately before the z-th simultaneous elimination step is as il- 
lustrated in Figure 7.9: there is at least one path p G P[s,/irst(n)[, where 
the last occurrence of occa is followed by an occurrence occ/3 at a block m 
such that (3 blocks a. This blockade is resolved within the z-th simultaneous 
elimination step. Hence we have: 

eliminable{Gi-i, occ/s) (7.10) 

Note that this is a chain of blockades in the original program G. 

Recall that we are restricted to assignment sinkings for the sake of presentation. 
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Fig. 7.8. Situation before the i-th simultaneous assign- 
ment motion step 



Let us now examine the two possible alternatives on the behaviour of occ/s 
during the i-lst simultaneous assignment motion step: 

1. occ /3 is moved at most locally within m or 

2. occ /3 is moved globally, i. e. j3 is moved across the entry of m. 




Fig. 7.9. Situation before the i-th simultaneous elimi- 
nation step 



Starting with point (1) we will show that this case cannot happen. Therefore, 
let us assume 

Srci^^^^ioccp) = {occ'p} 

with occ'i^ being located at basic block m, too, and lead this assumption to 
a contradiction. Due to Definition 7.2. 1(2) we have 

eliminahle{EL^^{Gi-i), occ'jj) 

According to Lemma 7.3.2 the simultaneous elimination step can be expressed 
by means of a suitable sequential representation: 

eliminable{EL'^; ELa-i', .. . ; ELq,^ (Gi_i), occ^) 

where ELq,. (f = 1, . . . ,k) refers to elimination transformations with ai yf (3. 
Repeated application of the premise on the absence of EE-effects finally leads 
to 



elimmable{Gi-i, occ'/j) 

This, however, is in contradiction to the maximality of the elimination trans- 
formations involved, since occ)^ is assumed to survive the i-th simultaneous 
elimination step. 
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Hence only point (2) remains, which grants that the induction hypothesis 
becomes applicable. This yields: 

Induction Hypothesis: There exists a path q G PG[s,/irst(m)[ such that an 
acyclic prefix of g contains a chain of blockades that 

— starts with an occurrence occ^ G Orig{occp) and 

— is at least of length i — 1. 

Justification of the a-assignment motion in the i-th iteration step ensures 
that there is also an occurrence occ^ of a on g as displayed in Figure 7.10(a). 





Fig. 7.10. Applying the induction hypothesis 



Since occ'^ is blocked by occjs this blockade must already be present in the 
original program G yielding an occurrence occ^ G Orig{occ'^) that precedes 
occ^. This finally establishes the existence of a chain of blockades on an 
acyclic prefix of p starting with occ° and having a length that is at least i 
(see Figure 7.10(b) for illustration). □ 

Let us now consider a slight variant of the parameter 



A block 


def 


max 

p£'P[s,first{n)[ ^ 

q acyclic prefix of p 


^ block 


def 


max 

nGN 



The essential difference to is that is defined by means of the 

maximum operator, however, restricted to a acyclic paths, which excludes 
infinite chains of blockades. 

An immediate consequence of Lemma 7.3.3 is: 

Theorem 7.3.3 (An Upper Bound for EE-Effect Free CUMECs). 

For a UMEC without EE-effects the iteration sequence (EL|^; AMp*(G) sta- 
bilises within 
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block 

of both assignment motion and elimination steps, i. e. 

(EL{;; 

is stable up to local reorderings within basic blocks. 

This estimation can directly be applied to VTCS as introduced in Chapter 

6 . 

Theorem 7.3.4 (An Upper Bound for 'PTC£). 

For VtFCS the sequence (FAEji*; AS|j‘)*(G) stabilises within simulta- 

neous assignment sinking and faint assignment elimination steps reaching a 
result that is ^ass-optimal in the universe ilT.^ce(G). 

Proof. The complexity estimation is an immediate consequence of Theorem 
7.3.3 and the fact that, by definition, the elimination of partially faint as- 
signments is free of EE-effects. Hence, we are left to show that the resulting 
program Gres is ;i^ass-optimal in il.p,Pce(G). First Lemma 7.3.1 guarantees 
that Gres is stable up to = under every assignment sinking and faint as- 
signment elimination. Thus according to Corollary 7.2.2 Gres is an optimal 
element of the preorder i — >*. Finally, since both assignment sinkings and 
faint assignment eliminations are compatible with '^ass, Lemma 7.2.4 yields 
that Gres is ;i^ass-optimal within iX-pjrce{G). □ 

At this point it should be recalled that the elimination of faint assignments is 
the only component transformation that cannot take advantage of bit- vector 
data flow analyses. However, at a later point we will see how to speed up 
the above iteration sequence by reducing the amount of faint assignment 
elimination steps to a single one (cf. Theorem 7.3.7). 

7. 3. 4. 2 Weak EE-Effects. Unfortunately, some practically relevant prob- 
lems like T’TZJ^ and PX>C£ have EE-effects. However, in these cases we 
succeed by showing a slightly weaker criterion that guarantees that EE-effects 
are completely restricted to initial ones. Essentially, assignment motion must 
not create “new ” opportunities for EE-effects. More formally, we call the 
EE-effects of a UMEC weak, if and only if for every a,P G AV (not neces- 
sarily different), AM^ = G G' G Tam and EL^ = G' G" G Tei such 
that G does not contain eliminable occurrences of a we have: 

eliminable{kt\a\'ELa{G), occp) eliminable{ kt\a{G) , occ fs) (7.11) 

G" G' 

The key for fast stabilisation in CUMECs with weak EE-effects only, is the 
observation that these can be resolved completely in advance. Once such a 
preprocess has been performed, the iteration sequence behaves like one of a 
CUMEC without EE-effects, which allows us to adopt the results of Theorem 

7.3.3. 
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Theorem 7.3.5 (Upper Bound for CUMECs with Weak EE-Eff.). 

Let us consider a CUMEC with weak EE-effects. Then there is an iteration 
sequence of the form (ELp *; (AMy*; ELp*(G) that stabilises within 

^mit.ehms _|_ ^biock _|_ simultaneous elimination steps and 
^biock _|_ simultaneous assignment motion steps, 

where denotes the number of simultaneous elimination steps that 

is sufficient to guarantee stabilisation o/ (ELp 

Proof. Let G' denote the program that results from G after the initial se- 
quence of simultaneous elimination steps gets stable. As in Lemma 7.3.3 let 
us further abbreviate the programs resulting from G' by means of further 
iteration steps in the following way: 

G'"=^AM{[;EL|[r(G'), z^O. 

In order to adopt the reasoning on the number of iterations from Lemma 
7.3.3 it is enough to show that for all G', z ^ 0 there holds: 



No occurrence in G( is eliminable and (7.12a) 

AMj^(G') is free of EE-effects (7.12b) 

We proceed by an induction on the number of simultaneous iteration steps 
performed on G'.^^ The induction base for Gg is trivial. Therefore, we are left 
to show the induction step. For the proof of Property (7.12a) let us assume 
that 



eliminable{G[, occa) 

holds for i > 1. This can be rewritten as 
eliminable(EL^(AMll(G'^_j^)), ocCa) 

According to Lemma 7.3.2 the simultaneous elimination steps can be sequen- 
tialised as follows: 

eliminable{ELa.^; . . . ;EL„,(AMj[(G'_i)), occ„) 



Note that jg always bound by | Occav (G) | , the total number of occur- 
rences in the original program. The factor ^ however, can even refer to the 

program (ELj) ) * (G) instead of G. 

Because an incrementation of i means two simultaneous iteration steps, this is 
not an induction on i. 
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Multiple application of the induction hypothesis that AM|^(G'_^) has no EE- 
effects finally delivers: 

eliminahle{kW^^{G[_i) , occa) 

Due to the maximality of the eliminations employed, this would mean that 
ocCa gets eliminated during the i-lst simultaneous elimination step, which is 
in contrast to the assumption ocCa € G'. Thus we have 

-^eliminable{G[, ocCa) 

as desired. 

For the remaining proof of (7.12b) suppose that 
eZzmmoWe(ELa(AM|^(G()), occ/s) 

holds for an occurrence occ/j and an elimination transformation ELq, = 
AMj|‘(G') i-^Q G". Rewriting this expression delivers: 

eliminable(AMjl;ELa(G'^), occ/^) 

According to Lemma 7.3.2 this can be written as 

eliminable{PMa^; . . . ; AMqj,; ELa(G(), occ/j) (f.l3) 

with at = a. According to the induction hypothesis no occurrences of a are 
eliminable in G'. Repeated usage of the contrapositive of the 4=-direction of 
Definition 7.2. 1(2) delivers that this property still is true for the program 

G/= AM„,;... ;AM„,_,(G') 

With this notation the predicate of (7.13) can be rewritten as: 
eZzmmaWe(AMo,; ELc(Gi), occ/3) 

At this point we can exploit our assumption on the weakness of the EE-effects 
yielding 

eliminable {AEa{Gi), OCC 13 ) 

Repeatedly applying the =^-direction of Definition 7.2. 1(2) and using Lemma 
7.3.2 we finally obtain 

eZzmmoWe(AMy‘(G(), occ/3) □ 



Applications of Theorem 7.3.5 are VDCS as well as VTZAS. Moreover, 
Theorem 7.3.5 is the key for a surprising improvement on the complexity of 
V^CS. In order to start with the first point we can immediately prove: 
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Lemma 7.3.4. Both T*T>CS and are CUMECs with only weak EE- 

effects. 

Proof. Consistency is already proved in Theorem 7.2.3. Therefore, we only 
have to check the weakness of the EE-effects. 

Weakness of EE-effects in VDCS: Let us assume assignment patterns a = 
X := If and /3 = y := ip such that no a-occurrences are dead in 
G. Moreover, let us consider an admissible assignment sinking AM^ = 
G G', a dead assignment elimination ELq, = G' G” and a G'- 
occurrence occf} of f3. To prove the contrapositive of Implication (7.11) we 
are going to show that liveness of occjs in G' also forces liveness of occ/3 
in G" . Liveness of occp in G' implies the existence of a corresponding 
use site of x being associated with an occurrence of an instruction 7 
along a path p leading to e.^® If a yf 7 then liveness of occp carries 
over immediately to G", as the use site of occp is not influenced by EL^. 
Otherwise, if a = 7 the situation gets more sophisticated. Therefore, let us 
denote the relevant p-path occurrences of (3 and 7 by occp^p and occ'^ p, 
respectively. Obviously, the G-path occurrence occ'^p = SrCf^^^{ocCa^p) 
must follow SrCf^^^{occf}^p). Due to the general assumption on G we can 
rely on the fact that occ^ is not dead. Hence consistency delivers that 
there is at least one occurrence occ" € Dst^.^^{occ'^) that is alive in G'. 
By construction, this occurrence follows occp on some program path such 
that there are no modifications in between, which finally establishes that 
occp is not dead in G" . 

Weakness of EE-effects in VT^AE : Let us assume assignment patterns a = 
X := f, and j3 = y := ip such that no a-occurrences are redundant 
in G. Moreover, let us consider an admissible asignment hoisting AM„ = 
G G', a redundant assignment elimination EL^ = G' G” 

and a G'-occurrence occp of pS. In order to prove the contrapositive of 
Implication (7.11) let us assume that occp is not redundant in G' and 
show that this property carries over to G” . According to the definition 
of redundancy (cf. Definition 6.4.1) there has to be a program path p 
leading from s to occ^ such that every prior /3-occurrence on p is followed 
by a path occurrence occ-y^p of an instruction 7 that either modifies y 
or an operand of Let us denote the relevant p-path occurrence of 
occp by occp^p and define occ'^p = SrCi^yi^{occ~y^p). Then, by the general 
assumption on G, occ(, is not redundant, and furthermore it is easy to 
see that occ(,p precedes SrCiyf^^{occp^p). Consistency then yields that 
there is a G'-occurrence occ" € Estjyf,^(occ!.^) that is not redundant. By 
construction, occ" defines a modification site for occp in G" that precedes 
0CCj3 on some program path such that all prior path occurrences of (3 are 
situated before the relevant path occurrence of occ". This finally prevents 
that occp is redundant. □ 

Without loss of generality we may assume that p £ P[s, e]. 
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The previous lemma is the key for estimating the complexity of V'DCS and 
V'R.AE. Before, let us get more concrete on the number of initial elimination 
transformations. Fortunately, for both applications this factor is determined 
through acyclic definition-use chains [W.78, KenSl] that are defined as fol- 
lows: 

Definition 7.3.3 (Definition-Use Chains). Let p '*= {pi,... ,pk) S P- 
A def-use chain on p is a subsequence {pi^, . . . ,Pi^) of p such that for any 
1 ^ j < r the nodes pi^ and are associated with assignment patterns 

a = X := ip and (3 = y := ip, respectively, such that 

— X £ SubExpr* (ip) and 

— the subpath (pi^ -|- 1, . . . — 1) does not contain an assignment to x. 

In contrast to chains of blockades (cf. Definition 7.3.2) def-use chains need 
not necessarily lie along acyclic program paths. For instance, in Figure 7.11 
the elimination of dead assignments would proceed by eliminating the assign- 
ments in order c := 6 -I- 1, 6 := a -I- 1 and a := a -I- 1, as it is reflected by 
the def-use chain in the figure. 




Fig. 7.11. Def-use chain and the elim- 
ination of dead assignments 



However, at least the def-use chain itself is acyclic. Therefore, let us denote 
the length of a maximal acyclic def-use chain in G by /\d,ef-use^ Clearly, 
for VDC6 the factor /\d,ef-use jg upper approximation of ^ 

Similarly, this also applies to VTZA£. Putting this together with the results 
of Theorem 7.3.5 and Lemma 7.3.4 we get in analogy to Theorem 7.3.4: 

Theorem 7.3.6 (Complexity of T’T^AS and T’DCS). For both V7tA£ 
and 'P'DCS the iteration sequence of the form (ELp *; (AMy*; EL|j‘)*(G) stabi- 
lizes within 

^def-use elimination steps and 

^biock giuiultaneous assignment motion and elimination steps 

reaching a result that is ^ass-optimal in the universe iX-p-^_A,s{G) and 
iX-PDcsiG), respectively. This means, the iteration sequence 

(ELp^'‘‘''"”‘'+i; (AM{[; ELp^‘“+i(G) 

is stable up to local reorderings within basic blocks. 
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We finish the section by presenting a surprising improvement on the com- 
plexity result of T’^CS (cf Theorem 7.3.4). The key for this modification 
is Lemma 7.3.4, where we proved that 7^X>C5 has only weak EE-effects. 
Clearly, a program without faint assignments is one without dead assign- 
ments, too. In fact, analysing the proof of the weakness of EE-effects in 
VT>C£, we can even show a stronger condition than Implication (7.11). For 
ASa = G i-^Q G' G AAS and FCEq = G' G" G TA£ such that G does 
not contain faint occurrences of a we have:^® 

Faint{kSa;FCEa{G), occis) Dead{kSa{G), occ^) (7.14) 

Following the lines of the proof of Theorem 7.3.5 no proper faint code can 
show up after an initial faint assignment elimination. In other words, faint 
assignments, which require a more costly analysis, can be treated completely 
in advance. In practice, this way should even be favoured to 7^X>C5, 

since we get rid of the initial sequence of simultaneous dead assignment elim- 
inations at the costs of a single simultaneous faint assignment elimination 
step. Hence we have: 

Theorem 7.3.7 (An Improved Upper Bound for T’^CS). 

For VTCS the iteration sequence of the form FAE||‘; (AS|^;DAEp*(G) sta- 
bilises within simultaneous assignment sinking and dead assignment 

elimination steps reaching a result that is '^ass-optimal in the universe 

^'pj=ce{G). 



7.4 WPIZS as an Advanced Application of MECs 

As the remaining application of Chapter 6 we are left with UV'R.E. Unfor- 
tunately, WPFiS as introduced is not an MEC, since we have to cope with 
expression and assignment motions at once. However, we will see that the 
situations can be reduced in a way such that the central results for T’'R.J[£ 
can be exploited. 

As opposed to Chapter 4, here expression motions may be applied at distinct 
intermediate stages rather than in a closed setting. This new situation leads 
to two small modifications of the definition of admissible expression motion 
(cf. Definition 3.1.1) that are only for technical reasons: 

1. Simplification Convention: if the complete right-hand side expression of 
an assignment h.^ := tp has to be replaced by h.^ then the whole assign- 
ment is eliminated in order to suppress the generation of useless assign- 
ments := h.^p. 



29 



Faint and Dead are here used in the way like the predicate eliminable. 
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2. Naming Convention: the subscript of a temporary is associated with its 
corresponding original expression rather than the renamed expression 
that may be the result of previous expression motion steps. For instance, 
if the original assignment x := ip + ip \s turned into x := due 

to expression motion, the temporary associated with the right-hand side 
term is still called h.^p+^ rather than hh^+h,,,- 

To formalise the latter convention the original shape of an expression p shall 
be denoted by (p which can be defined defined inductively: 

{ p if is a constant or variable 

ip if is a temporary h,;, (7-15) 

uj{pi,... ,pk) if p = Lo{pi,... ,pk) 

Furthermore, the notion G G' shall indicate that G' results from G 

by an expression motion (in the above sense) with respect to the expression 
pattern p. 

7.4.1 Computationally Optimal UT’TLS 

The primary goal of the uniform elimination of partially redundant expres- 
sions and assignments is to minimise the number of computations on every 
program path. This intent also fits our intuition that assignment motion is 
mainly employed as a catalyst in order to enhance the potential of expression 
motion. 

In analogy to the MFCs considered before the proof of expression optimality 
is essentially divided into two steps: 

1. Proving confluence of UT’^RS and 

2. Proving that the elementary steps conform with 

Whereas the second point is obvious, the proof of the first point benefits from 
the confluence of VTtAS by using the trick to simulate expression motion 
through assignment motion. To this end, we consider a program transforma- 
tion that 

— splits up every instruction a into a sequence of assignments := 
p; a\h^/p], 

where a\h<f,/p\ results from a by replacing all occurrences of by h,^. For 
instance, an assignment x \= is decomposed into the sequence of assign- 
ments := p; X := h,^. Note that splitting transformations are just a 
special kind of expression motions and, therefore, do not introduce a new 
type of transformations. We shall write SP^(G) for the program that results 
from G by means of a (maximal) splitting transformation with respect to p. 
Then we have: 

Lemma 7.4.1 (Simulation Lemma). BEM,p = SP^; RAE((_._^ 
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Proof. Let G" = SP^(G) and a = := ip. First let us take a look at 

the close connection between the expression motion related predicates in G 
and the assignment motion related predicates in G' . In comparison to G the 
program G' may have some additional program points in between the two 
assignments of split instructions. Let us identify all other program points 
in Nq and Nc, i. e. all program points apart from the new ones caused by 
splittings. With this, the proposed correspondences read as: 



AEf-Justpp 




f DnSafel 
( false 


if h G Ng 
if h G Nc \ Ng 


(7.16a) 


AEf-Substpp 




f BEn;p- Correct f if h G Nq 
[ false if h G Nc \ Nq 


(7.16b) 


Redundant'? 




r UpSafe:f 
[ true 


if h G Ng 
if h G Ng' \ Ng 


(7.16c) 



All properties are immediate from the according definitions. Moreover, the 
definitions of BEM,^ (cf. page 25) and the one of maximal assignment hoistings 
AH^_ (cf. page 161) provide the following relation between the insertions of 
the expression and the assignment motion: 

Vh G Nq. BE¥i^-Insertf kEf-Insertf^ (f.l7) 

Hence we are left to show that exactly those AH((-insertions that have a BEM,^- 
counterpart survive the final redundancy elimination: 

Vh G Nq’- Insert^ (BEM,^-/nsertV -^Redundantf^) (7.18) 

This, however, is straightforward using the equivalences in (7.16). □ 

Let us now sum up a few properties on the interaction of maximal splittings 
with other kinds of maximal transformations. 

Lemma 7.4.2 (Splitting Lemma). Let p, if, p G SV{G) and let us con- 
sider the following assignment patterns a = x := p, ains = '■= P, and 

OCmod = x := p\h.qi/p\. 

1 . SP((;SP(; = SP(;;SP^ 

3AH„,„ G AH^; SP(^ = SP(^; AH„_; AH((_^ 

3. p€ SuhExpr*{p) : RAE((; SP(( = SP^; RAE((^„j RAE((_^ 

Proof. Part 1) is trivial if p and are independent, i.e. neither of the 
expressions is a subexpression of the other one. Otherwise, without loss of 
generality, let us assume that p G SubExpr* (tp). In this case the naming 
convention guarantees that every instruction a that contains an occurrence 
of Ip is uniquely decomposed into the sequence 
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K ■= 7^; := a[h^/tp] 

Part (2) can be easily checked by means of a careful analysis of the justifi- 
cation predicates triggering the various assignment hoisting transformations 
along the lines of the proof of Lemma 7.4.1. 

For the proof of Part (3) let us assume that ip G SubExpr*{p) and consider a 
G-occurrence of the assignment x := p. It is easy to see that x := 
is redundant in SP^(G) if and only if a; := p is redundant itself. In this case 
all initialisation statements h<^ := ip that immediately precede a redundant 
occurrence of x := p[h^/(/3] become redundant, too, whereas all other occur- 
rences of := ip are used. □ 

This result is the key in order to prove local confluence of i — > which stands 
for any kind of elementary transformations of type and . In 

fact, we have: 

Lemma 7.4.3 (Local Confluence ofU'P'R.S). ' — > is locally confluent, 
i. e. two transformations TRi and TR 2 that are applied to a program Go can 
be completed as shown below: 










G3 




G2 



Proof. Most of the work is already done in the proof of Theorem 7.2.2 and 
Theorem 7.2.3. We are only left with the case that TRi or TR 2 refer to an ex- 
pression motion. Let TRi = Go Gi. Since TRi can be completed to BEM,,, 
by appending a suitable (^-expression motion and TR 2 can be chosen maximal 
as well, we only have to prove local confluence for the three situations pic- 
tured below, where the BEM^^^-transitions are split up following Lemma 7.4.1: 
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Let us now examine the situation in more detail: 

Case a): We demonstrate the most intricate subcase, where ip G SubExpr(p). 
In this case we succeed by “clustering” the plane with patterns for which 
local confluence is already proved. This is illustrated in Figure 7.12(a), 
where (I) is due to Splitting Lemma 7. 4. 2(2), pattern (II) is a simple 
consequence on the maximality of AH((. , and patterns (III) and (IV) are 
due to case 4 and 5 in the proof of Theorem 7.2.2, respectively. 

Case b): Again we only give the argument for the subcase, where ip G 
SubExpr(p). Figure 7.12(b) shows the corresponding clustering. Pattern 
(I) is due to Splitting Lemma 7. 4. 2(3), pattern (II) is according to case 6 
in the proof of Theorem 7.2.2 and (V) is a consequence of the maximality 
of RAE(( . Finally all remaining “diamond patterns” (III), (IV) and (VI) 
are according to cases 3 and 5 in the proof of Theorem 7.2.2. 

Case c): In this case the full clustering becomes quite large-scale. However, 
the interesting part is displayed in Figure 7.12(c). The patterns marked 
(I), (II) and (III) are according to Splitting Lemma 7.4.2(l)-(3), respec- 
tively. The unprocessed part in the middle can easily be completed by 
using “diamond patterns” according to the cases 3 to 5 in the proof of 
Theorem 7.2.2. □ 
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Fig. 7.12. Proving local confluence oiWPTiS by clustering 
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Since no expression motion can be reversed (as opposed to local assignment 
motions), we can immediately adopt the notion of program equivalence of 
Definition 7.2.2. Then Lemma 7.2.1 also fits to the new situation, which 
allows us to use the same line of argumentation as in Section 7.2.3. This 
leads us to: 

Theorem 7.4.1 (Confluence Theorem for UT’TLS). UT’TLS is conflu- 
ent, i. e. its associated relation i — > is confluent. 

Up to this point splitting transformations were mainly used on a conceptual 
level easing, for instance, the reasoning on confluence. Now we show that 
splitting steps are also useful to obtain a concrete iteration strategy that 
stabilises reasonably fast. After a preprocess of splitting transformations, 
expression motion is entirely covered by assignment motion and redundant 
assignment elimination. According to Lemma 7. 4. 2(1) the maximal splitting 
transformations can be processed in any order. Thus in accordance to expres- 
sion motion, where transformations were extended to sets of expressions, we 
denote a complete splitting step, i. e. a sequence of maximal splittings per- 
formed for each expression pattern in <P "*= SV{G), by SP^(G).^° Then we 
have: 

Theorem 7.4.2 (Complexity of WP7?.£). For UVIZS the sequence 
SP^; (RAEp*; (jAHji*; RAEp*(G) stabilises within 

— one initial complete splitting step, 

— simultaneous redundant assignment elimination steps and 

— simultaneous assignment hoisting and redundant assignment elimi- 
nation steps 

reaching a result that is ^^„,p-optimal in the universe ^Xu-ptle{G) . 

Proof. According to Theorem 7.3.6 the iteration sequence 

(RAEp*; (AHji*; RAEp*(SP^(G)) 

G' 

G" 



stabilises within the proposed bound if iX’p-R,. 4 e(G') is considered as the un- 
derlying universe. Hence we are left to show that no expression motion can 
modify the resulting program. Let AM,^ be an expression motion being appli- 
cable to G". Then particularly BEM,^ is applicable to G". According to the 
Simulation Lemma 7.4.1 we have: 

BEM^(G") = SP^; AHC^^^^;RAEC^^^^ 



In practice, SP^ proceeds from large expressions down to more elementary ones. 
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Since tp only occurs in assignments := ip after the initial splitting trans- 
formations, our naming convention ensures that a further splitting does not 
have an effect anymore. Hence: 

Due to the stability in both steps on the right-hand side leave G" 

invariant up to local reorderings of assignments. Finally, according to Lemma 
7.2.4 the ;i^g,j,p-optimality of G" is a consequence of the fact that all elemen- 
tary transformations are well-behaved with respect to 

7.4.2 UT’TtS and Code Decomposition 

The importance of Theorem 7.4.2 is the fact that expression optimality in the 
domain iSuT-rts is actually independent of the decomposition of large expres- 
sions. In fact, it does not matter, whether a complex expression is split up in 
the original program or if it is split up by the algorithm. This is in contrast 
to the dilemma in expression motion, where splitting expressions simplifies 
the analyses, but comes at the price of poor results. On the other hand, in 
practice we are sometimes faced with intermediate languages, where large 
expressions are already split. Figure 7.13 presents the facets of this problem. 
The point here is that the loop invariant large expression can be moved out 
of the loop using the techniques of Chapter 4, if the large expression is part 
of a structured set of expressions. On the other hand, after decomposing the 
large expression (perhaps due to generation of 3-address code), expression 
motion fails in moving all parts of the expression out of the loop, since the 
assignment to t defines a modification of f -I- c. In contrast, UVTZS succeeds 
in both cases, where the final programs only differ in the names of variables. 



7.4.3 UV1Z6 with Trade-Offs between Lifetimes of Variables 

We conclude this section by showing how the techniques from Section 4.4 can 
beneficially be employed in order to minimise the lifetimes of variables. In 
[KRS95] we showed that within the set of computationally optimal results of 
lAV'R.S neither the number of assignments nor the lifetimes of temporaries, 
as secondary goals, can be uniformly minimised. However, like in Chapter 4 
the reasoning on lifetimes of temporaries was implicitly based on a notion of 
lifetime ranges that is not adequate. In this section we therefore develop an 
adoption of the fully flexible expression motion strategy of Chapter 4 that 
works for In particular, as opposed to the presentation in [KRS95] 

we make the following assumptions: 

— We do not distinguish between temporaries that are introduced by expres- 
sion motion (or the splitting transformations) and variables of the program. 
This makes the approach applicable regardless, whether large expressions 
are decomposed in advance or during UT’'R.£. 
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t 



UVR.S 

{a + &,(a+6) + c} 



t 



UVTtE 



out((a+b)+c) 



Decompose 





Fig. 7.13. Impact of decomposing large expressions 



— End points of lifetime ranges are determined through actual use points 
of variables. This is in contrast to [KRS95], where the lifetime ranges of 
temporaries were (implicitly) assumed to end at the original use sites. 

Before we are going to sketch the algorithm let us reconsider the notion of 
a lifetime range. Throughout this section we will not develop a fully for- 
mal presentation, but rather sketch the essentials which are sufficient for the 
construction. 

Lifetime Ranges of Variables Essentially we get along with a simple no- 
tion of lifetime ranges in the flavour of Definition 3.3.1. However, there are 
some subtle differences that should be mentioned. 

— Lifetime ranges are defined in a uniform way for both program variables 
and temporaries. In fact, we do not distinguish between these both kinds 
of variables anymore, and consider them uniformly to stand for symbolic 
registers. 

— Lifetime ranges are determined through the motion invariant program 
points in the flow graph under investigation, i. e. basic block entries and 
exits as well as the entries and exits of immobile statements. This is jus- 
tified, as locally, within a basic block and between such program points, 
sequences of assignments can move freely.^^ 



Note that our reasoning again rests on the assumption that the number of sym- 
bolic registers is not bound which strictly separates our algorithm from the NP- 
complete register allocation problem. 
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— Variables are assumed to be initialised on every program path leading to 
a use site.^^ 

Using this notion of lifetime ranges a preorder that reflects the overall life- 
times of temporaries can be defined straightforwardly in the style of Definition 

4.2.2. 



The Algorithm 

The basic idea of an algorithm to compute a lifetime optimal representative 
among the computationally optimal results in XiwpTie{G) is completely along 
the lines of Section 4.5. Again the key is to model situations that can be 
used for profitable trade-offs between lifetimes of symbolic registers by means 
of labelled DAGs, where here labelled dependency DAGs play the role of 
the labelled expression DAGs in Section 4.5.3. However, the construction of 
labelled dependency DAGs is more complicated than in the expression motion 
setting, which is reflected by the fact that their construction requires the full 
process of an exhaustive assignment sinking. 

Construction of labelled dependency DAGs: starting with a computationally 
optimal program G' G -iluT^neiG) maximal assignment sinkings are 
performed until the program stabilises. At the program points that char- 
acterise lifetime ranges, i.e. the entries and exits of basic blocks and im- 
mobile statements, the assignments that are moved across such a point 
successively enter into the construction of the labelled dependency DAG. 
Leaves of the DAG are annotated by variables^^ and inner vertices are 
annotated by assignment patterns. Edges reflect the execution-order de- 
pendencies. An inner vertex is associated with an assignment that has 
to be executed after all the assignments of its successor vertices and the 
leave variables are already assigned (cf. Figure 7.14). 

After their construction the dependency DAGs are labelled. To this end, 
first the assignment sinking is undone establishing the starting situation 
of program G'A^ A dependency DAG is labelled with symbols of the set 
{•, ■, O, □} in the same way as an expression DAG (cf. Section 4.5.3). 
The first property pair is particularly easy: leaf vertices address register 
expressions, while inner ones refer to initialisation candidates. The second 
property pair requires more efforts. In order to tell release candidates 
from the other ones we have to evaluate, if the variable associated with 

This assumption is reasonable for well-formed programs. However, even other- 
wise we can always assume that the lifetime range starts in s, whenever an 
initialisation is absent on a path leading to a use site. Note, however, that this 
condition is automatically granted for temporary variables. 

That means, program variables or temporaries. Constant type operands are sim- 
ply ignored. 

In practice, the initial situation should be memorised and be reestablished. 
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a vertex^® is definitely used at a strictly later program point. Note that 
this property particularly cannot hold for any vertex whose associated 
variable also occurs at an upper position in the same DAG. Otherwise, a 
simple live variable analysis [ASU85] exactly determines those variables 
that are used later. 

Optimal trade-offs: optimal trade-offs of lifetime ranges can be determined 
along the lines Section 4.5. A labelled dependency DAG is reduced to 
a bipartite graph whose optimal tight set delivers the optimal trade-off 
decision for the program point (cf. Theorem 4.5.1). 

The transformation: the trade-off information gives rise to a guided variant 
of exhaustive assignment sinking, where the sinking of assignments is 
prematurely stopped whenever this is justified by a profitable trade-off 
between the lifetimes of variables. This can easily be accomplished by 
adjusting the predicate that drives the sinking process. 

Note that the final flush phase of [KRS95] only performs the brute-force 
sinking of the first step, which here just serves as an intermediate step to 
gather enough information to guide the “true” sinking in a more “intelligent” 
way. In fact, adapting the arguments of Section 4.5 we are able to prove: 

Theorem 7.4.3 (Lifetime Optimal WV7LS). The WPTZE -algorithm de- 
scribed above leads to a lifetime optimal result among all computationally 
optimal results in ilu 'P'jz.e (G). 




Fig. 7.14. The de- 
pendency DAG asso- 
ciated with the exit of 
basic block 3 



That is the left-hand side variable of the assignment annotated at an inner vertex 
or the variable annotated at a leaf vertex. 

Note that this is an analysis on the original flow graph. 
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8. Assignment Motion in the Presence of 
Critical Edges 



Analogously to the first part of the book, the final chapter of the second 
part is also devoted to the impact of critical edges. Unlike in expression 
motion, where at least some results were already well-known, the situation 
in assignment motion is completely unexplored. Surprisingly, however, here 
the presence of critical edges has even more serious consequences than in 
expression motion. We start this chapter by considering the situation that 
is characterised by the straightforward adaption of the notion of motion- 
elimination couples and later also examine enhanced variants revealing an 
interesting precision-complexity trade-off between both alternatives. 



8.1 Straightforward Adaption of MECs 

Unfortunately, a straightforward adaption of the notion of MECs from the 
situation of flow graphs without critical edges is not satisfactorily for different 
reasons. 

8.1.1 Adequacy 

The most significant deficiency is that the results that can be obtained in 
3'®crit Eire poor compared to the ones in 3"0. This is illustrated in Figure 

8.1 for VT>CS. The point here is that after splitting critical edges, VT>CS 
would result in the program of 8.1(d). The important observation now is that 
the resulting program does not actually insert code at synthetic nodes, since 
the assignment at node S 14 is dead and thus can be eliminated. However, 
the transformation is not in the scope of the adapted version of T’TfCS, 
since any admissible assignment sinking has to substitute the assignment at 
node 1 on the path going through node 4. The absence of the assignment 
pattern x := a -I- 6 on the path that goes through node 2, however, prevents 
any movement of the assignment at node 1. Hence the results that can be 
obtained by means of a straightforward adaption are not adequate in the 
following sense. 

Definition 8.1.1 ((Uniformly) Al-Adequate MECs). Let Me be an 

MEC in S'0crit; At an MEC in Go, Gres G i?®crit o.nd G' € (5® • Let 



O. Ruething: Interacting Code Motion Transformations, LNCS 1539, pp. 199-210, 1998. 
© Springer-Verlag Berlin Heidelberg 1998 
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longer, if the underlying domain of programs is extended towards S'®crit- As 
an example we stress again 7^X>C5, which is still consistent (cf. Definition 
7.2.1) in the domain )5®crit; since the part of the proof of Theorem 7.2.3 that 
shows consistency does not use the absence of critical edges. On the other 
hand, Figure 8.2 gives an example, where confluence gets violated. The point 
of this example is that an initial elimination of the dead assignment at node 2 
destroys the opportunity for a further assignment sinking, whereas an initial 
assignment sinking is the better choice, because the elimination potential is 
still preserved. This finally leads to a program G 2 being strictly better in the 
number of assignments than G 3 . 



6 





dae 

V 




T 



5 X := a+b 
out(x) 








8 I X := 2 



Fig. 8.2. Non-confluence of 
VT)ce in the domain 5®crit 



8.1.3 Efficiency 

Besides the drawbacks sketched in the previous two subsections, one argu- 
ment speaks for the straightforward adaption as considered in this section. 
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This is the fact that all the results on the complexity of 7^X>C5, V^CS, 
and WPTZS carry over to the corresponding MECs operating on 
Thus we have: 

Theorem 8.1.1 (Complexity of V'DCS, VTCS ,V'R.A£ & UV'R.S). 

The complexity results of Theorem 7.3.4, Theorem 7.3.6, Theorem 7.3.7 and 
Theorem 7.4.2 remain true when applied within the domain S'®crit- 



8.2 Enhanced UMECs 

Fortunately, the deficiencies of the straightforward adaption can be avoided. 
However, it will turn out that the remedy of both deficiencies has to be paid by 
a significant increase of the number of iterations. This observation provides 
a strong and new argument for splitting critical edges that is even more 
important than the known drawback of higher solution costs of bidirectional 
data flow analyses (cf. Section 4.6). In fact, the same argument as presented 
in this section definitely applies to the extensions of Morel’s and Renvoise’s 
algorithm [Dha91, DRZ92, Cho83]. 

The deficiencies of Section 8.1 can be resolved by enhancing the assignment 
motions under consideration. Conceptually, this is accomplished by incorpo- 
rating eliminations already into the assignment motions. 

Definition 8.2.1 (Enhanced Assignment Motion). Let {Tam,'7'ei) bean 
MEC. An enhanced assignment motion is a program transformation that 
(conceptually) results from the following two step procedure: 

1. Perform an assignment motion AM^ = G' G Tam according to 

Definition 6.1.1.^ 

2. Remove all synthetic nodes in the resulting program G' . 

Obviously, this definition is not sound unless the second step only removes 
irrelevant code. Therefore, we define: 

Definition 8.2.2 (Admissible Enhanced Assignment Motion). 

An enhanced assignment motion is admissible, iff the assignment motion of 
step 1 in Definition 8.2.1 additionally only inserts assignments at synthetic 
nodes, if they are eliminahle. 

In accordance with the notation for “plain” assignment motions we will write 
G f—ia, G' , G i-^Q G' and G G' for the enhanced counterparts, 

and denote the set of enhanced admissible assignment sinkings and hoistings 
with respect to a G AV{G) by EAASa and EAATLa, respectively. Finally, 
enhanced admissible assignment motions induce a notion of enhanced MECs, 
for short EMEC, and uniform enhanced MECs, for short EUMECs. 

^ That means, an assignment motion in 5®- 
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8.2.1 Adequacy 

Since in CUMECs eliminable code stays eliminable it is easy to see that the 
eliminable occurrences in synthetic nodes can be removed immediately. Thus 
we have: 

Theorem 8.2.1 (Adequacy Theorem). For any CUMEC in (J® the en- 
hanced counterpart is adequate. 

However, even CUMECs with enhanced assignment motions need not be 
strongly adequate. Figure 8.3 shows the reason for this behaviour by means 
of the enhanced variant of T’T>CS. In essence, the point of this example is 
that, although the final program depicted in Part (d) does not contain any 
code in synthetic nodes, this cannot be achieved without intermediate steps 
like the ones in Part (b) that place non-eliminable assignments at synthetic 
nodes. 





c) 



d) 





Fig. 8.3. Lack of strong adequacy in enhanced critical 'P'DCS: a) the starting 
situation, b) after a simultaneous assignment sinking and a simultaneous dead as- 
signment elimination two occurrences remain at synthetic nodes, c) the situation 
after a further simultaneous assignment sinking and dead code elimination step, d) 
the resulting program after a final assignment sinking and dead code elimination 
wrt a ~ 2. 
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8.2.2 Confluence 

Fortunately, the enhanced versions also remedy the loss of confluence being 
observed for consistent MEGS in the straightforward adaption (cf. Section 
8.1.2). Thus as the counterpart of Theorem 7.2.2 we have: 

Theorem 8.2.2 (Confluence of Consistent EUMECs). 

Let {Tam^T'ei) be a consistent EUMEC. Then i — > is confluent. 

Proof. The proof benefits from the construction in the proof of Theorem 
7.2.2. This is accomplished by decomposing an admissible enhanced assign- 
ment motion EAM^ = G G' into a sequence of an admissible assignment 
motion AMq = G^ G' and an elimination step ELq, = G' '^-^a G" , 

where ELq eliminates the G'-occurrences of a that are inserted at synthetic 
nodes. Hence the arguments for confluence can be reduced to its counterparts 
in the case analysis of the proof of Theorem 7.2.2. We demonstrate this by 
means of Case 2 which is actually the most interesting one. Assuming the 
situation of Figure 8.4(a) the decomposition yields a situation as depicted in 
Figure 8.4(b). Note that the eliminations are assumed to be maximal, which 
can always be established by appending suitable transformations. Using the 




Fig. 8.4. Reducing Case 2 to the situation in 5® 

patterns for local confluence as given in the proof of Theorem 7.2.2, the dia- 
gram of Figure 8.4(b) can be completed as shown in Figure 8.5(a). The upper 
diamond is according to Case 2, while the other two situations are according 
to Case 6. It should be noted that the pattern of Case 6 is slightly modifled, 
as maximal eliminations are used. However, the proof can easily be modifled 
for this additional feature.^ On the other hand, G3 in Figure 8.5(a) can be 
easily chosen in a way such that every inserted instance of a is an insertion of 
TRi or TR2. Then consistency ensures that every insertion at synthetic nodes 
in G3 is eliminable. This Anally allows to complete the situation of Figure 
8.4(a) as shown in Figure 8.5(b). In fact, we even have a “direct” enhanced 
assignment motion that dominates the other two. □ 

^ More specifically, TR3 and TR5 in Case 6 of the proof of Theorem 7.2.2 can be 
chosen maximal, which only eliminates some additional T-occurrences. 
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Fig. 8.5. a) Completing the diagram of Figure 8.4(b) b) Completing the diagram 
of Figure 8.4(b) 

8.2.3 Efficiency 

Unfortunately, the two positive effects caused by the usage of the enhanced 
versions have to be paid by a significant slow-down in the convergence speed of 
exhaustive iterations. To this end, let us start by investigating the maximal 
component transformations. According to Figure 8.5(b) also for consistent 
EUMECs maximal representatives exist. Since in this section we are mainly 
interested in the phenomena caused by critical edges, we will not investigate 
the construction of such transformations which, in practice, can again be 
determined by means of bidirectional data flow analyses following the lines 
of Chapter 5. Denoting the simultaneous execution of maximal enhanced 
assignment motions by EAMJ^(G) (cf. Definition 7.3.1) the general upper bound 
of Theorem 7.3.1 still applies to consistent EUMECs. 

Theorem 8.2.3 (An Upper Bound for Consistent EUMECs). 

For any consistent EUMEC an alternating sequence of simultaneous en- 
hanced assignment motion and elimination steps, i. e. an iteration sequence 
in (ELy*; EAMj|‘)*(G), stabilises within 

(2 |N| Z\ -b 1) 

of both kind of simultaneous iteration steps. 

The remainder of this section, however, is devoted to the fact that even under 
the most severe assumption, i. e. for consistent EUMECs without EE-effects, 
this bound is asymptotically tight. In other words, critical edges reintroduce 
the “bad” parameter |N|, which means extra costs linearly depending on the 
size of the flow graph under consideration. We demonstrate that this effect 
indeed can be observed for the enhanced version of “PiFCS, which in the 
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uncritical case served as our model class of MECs with maximum convergence 
speed. 

Partial Faint Code Elimination and Critical Edges In Section 7.3 we 
proved that in 3^© iteration sequences with regard to “P!FCS meet a linear 
bound that is even expected to be sublinear in practice (cf. Theorem 7.3.3). 
As opposed we will see that for programs in )J0crit the quadratic worst-case 
bound of Theorem 8.2.3 gets asymptotically tight if we decide to take the en- 
hanced versions for assignment sinking. First let us take a look at the program 
fragment in Figure 8.6, which is is the basic component of this construction. 
The point of this Figure is that the assignment i := i -|- 1 at node 1 has 




Fig. 8.6. “Synchronisa- 
tion delays” in enhanced 

VJ=CE 



to wait k steps for “synchronisation” with its matching counterpart at node 
2 . Previously, the chain of increments to y has to be moved along the right 
branch to node 5 . Note that these sinkings are enhanced ones, as y is dead 
along a path through node 4 . Afterwards the assignments to i can be sunk 
to node 4 resolving the blockade at node 1 . This finally enables the sinking 
of the x-increments along the left path to node 3 . 

The above example provides the basic component of a more complex example. 
The program displayed in Figure 8.7 is essentially composed out of several 
versions of the program fragment of Figure 8.6 together with a similar starting 
situation. For the sake of presentation chains of assignments x := x + i + j 
and y := y + i + j are abbreviated by and Ay, respectively. The point 
of this example is that both sequences A^ and Ay are alternately blocked 
by assignments i := i -|- 1 and j := j + 1, respectively, that are waiting 
for “synchronisation” with a corresponding assignment. For example, at the 
beginning the sequence A^, is blocked for A: -|- 1 enhanced assignment motion 
steps as illustrated in Figure 8.8. Afterwards, the code sequence Ay is blocked 
at node 7 for another fc -|- 1 enhanced assignment motion steps waiting for 
the synchronisation of the blocking assignment to i. The situation after k + 
2 iteration steps is displayed in Figure 8.9. Obviously, the above program 
scheme provides the following result: 
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Fig. 8.7. Illustrating slow convergence of enhanced 'P^CE: the starting situation 

Theorem 8.2.4 (A lower Bound for Enhanced PPCS). There is a 
family of flow graphs where Gij has i nodes and j = 

such that the number of simultaneous iteration steps that are necessary in 
order to reach a stable solution for "PPCS is of order 

n{i*j) 

Recall that this significantly contrasts from the situation for flow graphs out 
of 5®, where the dependency on the number of basic blocks in the flow graph 
is completely absent. Therefore, let us seize the opportunity to take a look 
on the example of Figure 8.6, if synthetic nodes are inserted on the critical 
edges. Figure 8.10 shows the situation after k + 2 simultaneous assignment 
sinking and faint assignment elimination steps, i. e. the situation being almost 
comparable to Figure 8.8. However, in Figure 8.10 the iteration process is al- 
ready stable. The intuitive reason for this is the fact that synthetic nodes can 
immediately resolve those situations causing “synchronisation delays” . For 
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Fig. 8.8. Figure 8.7 after fc + 1 iterations 



instance, the assignment i := i + 1 can move to the synthetic node on the 
edge (1,4) in the very first iteration. In fact, this situation has some similar- 
ities to Petri-net theory, where the absence of critical transitions is known as 
free choice property [Rei85, Bes87]. Similarly to our application, many prob- 
lems on Petri-nets, like for instance, determining liveness or reachability, are 
known to be significantly easier to solve for free choice nets than for arbitrary 



ones. 
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Fig. 8.9. Figure 8.7 after 2fc + 2 iterations 
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Fig. 8.10. Speeding up convergence in Figure 8.7 through synthetic nodes: stabil- 
isation after k + 2 iterations 



9. Conclusions and Perspectives 



In this monograph we examined how to cope with prominent interactions 
in code motion in a systematic way. Common to all problems we studied 
was the dilemma that while, on the one hand, exploiting interactions offers 
a great potential for optimisations this, on the other hand, has to be paid 
by the increased conceptual and computational complexity of this process. 
Thus the design of an interacting code motion algorithm requires to capture 
as much of the optimisation potential as possible, while keeping the sched- 
ule as economically as possible. Our work was motivated by the observation 
that, as opposed to the data ffow analysis based design of the component 
transformations, there is no foundation that deals with the effects of a sys- 
tem of interacting code motion transformations in concert. For a broad class 
of practically relevant problems in expression and assignment motion this 
monograph offered rigorous techniques for the design and analysis of aggres- 
sive algorithms in the presence of interactions. 



9.1 Summary of the Main Results 

In the first part of the monograph we investigated the problem of lifetime 
optimal expression motion in the presence of composite expressions and their 
subexpressions. Based on the observation that state of the art techniques 
[Cho83, Dha88, Dha89b, Dha91, DS88, MR79, MR81, Sor89, KRS94a] did 
not capture the interdependencies between symbolic registers used to keep 
the values of composite expressions and their subexpressions, we provided 
the first adequate characterisation of the problem. In a first approximation 
we tackled a variant of the problem where interdependencies were considered 
in a levelwise discipline. 

This naturally guided us to view the trade-off problem in terms of a graph 
theoretical optimisation problem on bipartite graphs. Our application turned 
out to be opposite to the usual matching problem which, in essence, is to find 
a maximum one-to-one assignment between two sets. In contrast, our problem 
required to identify the parts (deficiency sets) that prevent the existence of 
perfect matchings. Lovasz’s and Plummer’s [LP86] excellent monograph on 
matching theory was of great help to gain the insight in the problem and to 
find an efficient solution. 



O. Ruething: Interacting Code Motion Transformations, LNCS 1539, pp. 211-213, 1998. 
© Springer- Verlag Berlin Heidelberg 1998 
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The second important finding was that the full, i. e. non-levelwise, problem 
which at first glance looked significantly harder than the restricted case could 
be fully reduced to a two-level problem. This finally resulted in our FFEM,^- 
algorithm for computationally and lifetime optimal expression motion in the 
presence of composite expressions. 

In the second part of the monograph we presented a uniform framework for 
the reasoning on program transformations that are mainly based on the mo- 
tion of assignments. As applications we considered partial dead (faint) code 
elimination, the elimination of partially redundant assignments and the uni- 
form elimination of partially redundant expressions and assignments. Charac- 
teristic for assignment motion based transformations are their second order 
effects which are the reason that the transformations have to be applied 
repeatedly until the process finally stabilises in order to fully exploit the op- 
timisation potential. Proving that this process is confluent, i.e. independent 
of a specific application order, turned out to be a quite extensive task when 
performed for the elimination of partially dead code [GKL“*'96]. Therefore, 
we looked for a common confluence criterion that uniformly applies to all our 
application areas. We found the solution in the consistency property which 
expresses that the elimination potential is preserved by all transformation 
steps. 

In addition we contributed a criterion that also guarantees that the ex- 
haustive iteration process stabilises reasonably fast. All our applications meet 
this criterion which, in essence, resulted from the observation that a special 
class of second order effects, elimination-elimination effects, are restricted to 
those that are already present in the initial program. In fact, our results pro- 
vided the theoretical evidence for measurements indicating that in real life 
programs the iteration process stabilises within a small number of steps. 

Finally, all our results were checked for the influence of critical edges. Beside 
some well-known difficulties, two new massive drawbacks were discovered: 
critical edges destroy the existence of lifetime optimal solutions in expres- 
sion motion, and critical edges may seriously slow down assignment motion 
based transformations. On the other hand, the “classical” deficiency of crit- 
ical edges, that is that their conceptual and computational complexity is 
worse than for their unidirectional counterparts, was even diminished in the 
light of a novel technique whose idea is to entirely eliminate bidirectional de- 
pendencies by introducing short-cuts for the information flow along critical 
edges. 

9.2 Perspectives 

There is a number of promising directions for future research which could 
not, or not sufficiently, be treated in this monograph. In the following let us 
briefly discuss a few of them: 
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Reducing Register Pressure 

The phase of trade-off-guided assignment sinkings as sketched in Section 7.4.3 
can be employed as a stand-alone technique to enhance register allocation 
[Cha82, CH90, Bri92b]. Classical register allocators are pinned to the situa- 
tion they find when starting their work. A lot of research has been devoted 
to improvements on the quality of the allocation process. However, even the 
most sophisticated allocator must fail, if the underlying situation, i.e. the 
register pressure on the symbolic registers at a program site, is already poor. 
Using our code motion technique one can probably avoid many potential 
allocation conflicts in advance leading to better register assignments. It is 
planned to give empirical evidence for the benefits of this strategy. 



Application Areas of the Trade-Off Algorithm 

We plan to explore further application areas of the graph matching technique 
for computing tight sets in bipartite graphs. Essentially, boiled down to its 
abstract kernel, the problem can be seen as a resource allocation task where 
the question of interest is to find a most profitable trade-off between some 
substitutable resources that enter into a process. 

Semantic Expression Motion 

Finally, the ideas on lifetime optimal expression motion should be directly 
transferable to the semantic variant of expression motion considered in 
[SKR90, SKR91, KRS98]. At the moment the semantic algorithms are re- 
stricted to busy expression motion. The reason for this is that in the semantic 
setting there is no chance to succeed with a single object view, as here all 
relevant values in a program have to be considered simultaneously. We be- 
lieve that the trade-off algorithm is the key for an adequate semantic variant 
of lazy expression motion. However, the semantic setting adds an additional 
degree of difficulty to the problem. Large classes of equivalent expressions 
can be split at program joins which would force that the number of symbolic 
registers that are needed to keep the value increases at this program point. 
The consequences of this phenomenon have to be investigated. 
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